rugantio / fbcrawl

A Facebook crawler
Apache License 2.0
661 stars 229 forks source link

Http Error #2

Closed bizfreak22 closed 5 years ago

bizfreak22 commented 6 years ago

Hey, I am getting a http error when I try to run the scraper. Any suggestions?

2018-09-17 15:41:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://mbasic.facebook.com/login/save-device/SoCipe2L>: HTTP status code is not handled or not allowed

I uncommented user agent, tried putting in other user agents, getting the same thing.

bizfreak22 commented 6 years ago

Ok, I noticed that your script is adding the name of the fan page at the end of the url..https://mbasic.facebook.com/login/save-device/SoCipe2L

Not sure how to fix that..

brahimwakrim87 commented 6 years ago

Hi , just set parametre : -a page="SoCipe2L" to -a page="/SoCipe2L"

rugantio commented 5 years ago

Hi, excuse me for the late reply! Sometimes the bot gets stuck because Facebook fingerprints the browser and tries to block the new scrapy device (or tries to save it as new). Although I've written some code in the parse_home function to bypass this behavior, sometimes it doesn't work well. A simple workaround is to log in via your traditional web browser once and everything should work fine. Also check your mailbox, sometimes fb sends you an email saying that an unknown device has been trying to access without permission.