rugantio / fbcrawl

A Facebook crawler
Apache License 2.0
661 stars 229 forks source link

cannot redirect to page #4

Closed aldwinharjanto closed 5 years ago

aldwinharjanto commented 5 years ago

Hi,

I tried to run your code by use command scrapy crawl fb -a email="facebook_email" -a password="facebook_password" -a page="KompasCOM" -o DUMPFILE.csv

i also tried changing the page parameter to -a page="/KompasCOM"

but it give an error

INFO: Parse function called on https://mbasic.facebook.com/KompasCOM ERROR: Spider error processing <GET https://mbasic.facebook.com/KompasCOM> (referer: https://mbasic.facebook.com/home.php?_rdr) Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback yield next(it) File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 30, in process_spider_output for x in result: File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in return (_set_referer(r) for r in result or ()) File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in return (r for r in result or () if _filter(r)) File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in return (r for r in result or () if _filter(r)) File "D:\KERJA\Neviim\fbcrawl-master\fbcrawl\spiders\fbcrawl.py", line 87, in parse_page temp_post = response.urljoin(post[0]) IndexError: list index out of range

How do i solve this??

EDIT

I tried the work around you mentions by log in via web browser, but it still give me the same error. Also i got no email about unknown device login like you mentions before. is there any work around i can try?

Thanks

hohvn commented 5 years ago

I think it regards to #1 #2 issues

rugantio commented 5 years ago

@aldwinharjanto The command is correct (the first one), I can crawl the page without problems.

aldwinharjanto commented 5 years ago

@rugantio i keep getting the same error message. Any workaround to solve this??

rugantio commented 5 years ago

@aldwinharjanto OK I found some time to debug. The crawler is not working because your fb interface language is not set to "italian", so it cannot find the correct xpath (the selector that breaks is post.xpath(".//a[contains(text(),'Notizia completa')]/@href at line 86) It's written in the README that is the only language supported at the moment, I suggest to change language since even if you fix this precise selector to fit your language, it will still break on other selector. If you want to port the code to your language I encourage you too, at the moment I don't have much time available for this.

aldwinharjanto commented 5 years ago

@rugantio thanks man, i'll try to adjust your code. thanks for the help