Closed aldwinharjanto closed 5 years ago
I think it regards to #1 #2 issues
@aldwinharjanto The command is correct (the first one), I can crawl the page without problems.
@rugantio i keep getting the same error message. Any workaround to solve this??
@aldwinharjanto OK I found some time to debug. The crawler is not working because your fb interface language is not set to "italian", so it cannot find the correct xpath (the selector that breaks is post.xpath(".//a[contains(text(),'Notizia completa')]/@href at line 86) It's written in the README that is the only language supported at the moment, I suggest to change language since even if you fix this precise selector to fit your language, it will still break on other selector. If you want to port the code to your language I encourage you too, at the moment I don't have much time available for this.
@rugantio thanks man, i'll try to adjust your code. thanks for the help
Hi,
I tried to run your code by use command
scrapy crawl fb -a email="facebook_email" -a password="facebook_password" -a page="KompasCOM" -o DUMPFILE.csv
i also tried changing the page parameter to
-a page="/KompasCOM"
but it give an error
INFO: Parse function called on https://mbasic.facebook.com/KompasCOM ERROR: Spider error processing <GET https://mbasic.facebook.com/KompasCOM> (referer: https://mbasic.facebook.com/home.php?_rdr) Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback yield next(it) File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 30, in process_spider_output for x in result: File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "D:\KERJA\Neviim\fbcrawl-master\fbcrawl\spiders\fbcrawl.py", line 87, in parse_page
temp_post = response.urljoin(post[0])
IndexError: list index out of range
How do i solve this??
EDIT
I tried the work around you mentions by log in via web browser, but it still give me the same error. Also i got no email about unknown device login like you mentions before. is there any work around i can try?
Thanks