rugantio / fbcrawl

A Facebook crawler
Apache License 2.0
661 stars 229 forks source link

error in crawling data #5

Closed chiangandy closed 5 years ago

chiangandy commented 5 years ago

I try follow command scrapy crawl comments -a email="fino20181111@gmail.com" -a password="lotus19650807" -a page="https://mbasic.facebook.com/XxSunJinxX" -o DUMPFILE.csv

but I got error as below

2018-11-20 14:26:28 [scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/XxSunJinxX> (referer: https://mbasic.facebook.com/login/save-device/?login_source=login&refsrc=https%3A%2F%2Fmbasic.facebook.com%2F&_rdr) Traceback (most recent call last): File "/usr/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback yield next(it) File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output for x in result: File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in return (_set_referer(r) for r in result or ()) File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in return (r for r in result or () if _filter(r)) File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in return (r for r in result or () if _filter(r)) File "/Users/chiangandy/fino/crawler/fbcrawl/fbcrawl/spiders/fbcrawler.py", line 87, in parse_page temp_post = response.urljoin(post[0]) IndexError: list index out of range 2018-11-20 14:26:28 [scrapy.core.engine] INFO: Closing spider (finished)

I am using python 2 with scrapy 1.5.1.... could you please guide me what wrong with this....Thanks.

rugantio commented 5 years ago

Have a look at issue #2 and issue #4, let me know if that fixes for you. I'll be working on handling better the checkpoints and on porting the framework to English.

chiangandy commented 5 years ago

After I check, I found it is localization issue, because in my region, the hyper link test in Facebook is Chinese which is different from your original setting (look like is French). I fixed issue but..... unfortunately, my FB account is lock by Facebook. so sad ... :(

rugantio commented 5 years ago

The command you are executing is wrong. If you want to crawl the page you have to run this:

scrapy crawl fb -a email="fino20181111@gmail.com" -a password="lotus19650807" -a page="XxSunJinxX" -o DUMPFILE.csv

you need to change the credentials since they don't seem to work (it might be because of issue #2 since facebook saves the device that you are using), just open a fresh new fb account.

If you want to crawl the comments, you need to specify the link to the post (or photo), for example:

scrapy crawl comments -a email="fino20181111@gmail.com" -a password="lotus19650807" -a page="XxSunJinxX/photos/a.388150598059765/999405416934277" -o DUMPFILE.csv

at the moment the only features crawled by the comment scraper are the author and the text (no reactions) Let me know if this fixed your problem :)

fisiognomico commented 5 years ago

Have you tried to switch the script on and off using like :

x="if(t%2)else";python3 -c"[print(t>>15&(t>>(2$x 4))%(3+(t>>(8$x 11))%4)+(t>>10)|42&t>>7&t<<9,end='')for t in range(2**20)]"|aplay -c2 -r4