kevinzg / facebook-scraper

Scrape Facebook public pages without an API key
MIT License
2.41k stars 631 forks source link

Your request couldn't be processed #849

Open diegociarafoni opened 2 years ago

diegociarafoni commented 2 years ago

Hi, I tried to log in with my account but I get the error:

File ".../main.py", line 4, in set_cookies("facebook.com_cookies.txt") File "...\Python38\site-packages\facebook_scraper__init__.py", line 35, in set_cookies if not _scraper.is_logged_in(): File "...\facebook_scraper\facebook_scraper.py", line 524, in is_logged_in self.get('https://m.facebook.com/settings') File "...\facebook_scraper\facebook_scraper.py", line 454, in get raise exceptions.UnexpectedResponse("Your request couldn't be processed") facebook_scraper.exceptions.UnexpectedResponse: Your request couldn't be processed

I tried another account and the login was successful but when I try to use this code:

set_cookies("facebook.com_cookies.txt") posts = get_posts('nintento', pages=10) print(len(posts))

It return 0

What can I do? Thanks

curiousier-george commented 2 years ago

My impression is that Facebook is becoming stricter. Maybe they are specifically checking for facebook-scraper? I wonder what others' experiences are ...

MarcosChavarria commented 2 years ago

i think same, i have alot "Your request couldn't be processed" today :S

aflaldf commented 2 years ago

I am too getting this a lot lately.

NielsOerbaek commented 2 years ago

It can't really find a pattern in these errors. From my quick tests it seems that backing off for a few seconds and then retrying fixes the issue in most cases.

I made a quick-and-dirty recursive hack to add retrying to the FacebookScraper.get method, which seems to work. I haven't made a pull request yet, since im not 100p certain about the viability of the method.

But here are the changes i made:

Enuratique commented 2 years ago

I too am getting more of these when this library was working pretty flawlessly too. I thought maybe it was an IP ban but I used it locally and even now it's failing. If I refresh my cookies, the next request will work but then will start to fail pretty quickly after that. I agree that Facebook is getting stricter or somehow detecting this library.

Enuratique commented 2 years ago

@NielsOerbaek - one other thing to try modifying (instead of your retry logic) is to update the User Agent string in the default headers declaration at the top. It looks to be a rather old one and might be easy for Facebook to identify / single out as facebook-scraper traffic

Enuratique commented 2 years ago

i think same, i have alot "Your request couldn't be processed" today :S

same

mossmoss commented 2 years ago

It can't really find a pattern in these errors. From my quick tests it seems that backing off for a few seconds and then retrying fixes the issue in most cases.

I made a quick-and-dirty recursive hack to add retrying to the FacebookScraper.get method, which seems to work. I haven't made a pull request yet, since im not 100p certain about the viability of the method.

But here are the changes i made:

* Changed signature of [`get`](https://github.com/kevinzg/facebook-scraper/blob/d3704868cdcb9d01162b0781f89f90863769f198/facebook_scraper/facebook_scraper.py#L858) to `def get(self, url, retry_count=0, **kwargs):`

* Added the block to [line 925](https://github.com/kevinzg/facebook-scraper/blob/d3704868cdcb9d01162b0781f89f90863769f198/facebook_scraper/facebook_scraper.py#L925):

This worked for me too! I'm not sure why, or if it was just a random thing.

UPDATE: This stopped working for me :(

Enuratique commented 2 years ago

It can't really find a pattern in these errors. From my quick tests it seems that backing off for a few seconds and then retrying fixes the issue in most cases.

I made a quick-and-dirty recursive hack to add retrying to the FacebookScraper.get method, which seems to work. I haven't made a pull request yet, since im not 100p certain about the viability of the method.

I've implemented these changes in my own fork but it's not helping unfortunately. I think the maintainer will have to add more headers to more closely simulate what a real browser is doing

curiousier-george commented 2 years ago

I wonder whether it's possible Facebook is randomly responding with this error as a kind of Turing test. If there is a relatively immediate response, Facebook assumes human - if not, Facebook assumes bot (that crashed), and this code makes facebook-scrapper pass the test? Is that conceivable?

At the moment I can't even log in with facebook-scrapper, although the account itself has no ban of any sort as far as I can tell.

Enuratique commented 2 years ago

At the moment I can't even log in with facebook-scrapper, although the account itself has no ban of any sort as far as I can tell.

I think logging in with username and password no longer works and the only way that works is by harvesting cookies... Forget where I read that but it was on another issue in this repo... Give that a go.

I wonder whether it's possible Facebook is randomly responding with this error as a kind of Turing test. If there is a relatively immediate response, Facebook assumes human - if not, Facebook assumes bot (that crashed), and this code makes facebook-scrapper pass the test? Is that conceivable?

As someone with a decent amount of experience writing custom scrapers, the common reasons why code just stops working (in most common reason order):

1) Some kind of IP ban (e.g. Cloudflare) / flagging from too much traffic / too non-human-like traffic patterns 2) User Agent string becomes stale (calling set_user_agent with a current agent string gets rid of the "Unsupported browser" warning from Facebook) 3) Cookies expire or change in a deterministic way set by client-side javascript and the server can tell the cookies are stale and reject traffic 4) The server begins to care about specific headers in the request and in some cases the specific order of the headers (the requests library, which this repo relies on, uses a basic Python dictionary for headers which do not maintain the order as keys are added, and to get around this you have to use a Python OrderedDict for the headers - that's a fun one to debug)

Considering the seemingly random success / failure of the current code base, I think what might be happening (pure speculation on my part) is that requests are being load balanced across servers that have new configurations but not all servers have been updated) so if you get lucky and get routed to an old configuration, it works, otherwise it fails.

curiousier-george commented 2 years ago

Thanks, @Enuratique, for such a detailed reply! That's very helpful. Although I am aware of the possible issues generally, I don't have any experience writing custom scrapers until now.

To give a few more details, when I said that I couldn't log in with facebook-scraper, I was speaking loosely. Specifically, I'm getting the UnexpectedResponse("Your request couldn't be processed") exception as soon as I set_cookies(cookie_file).

On the same computer (with the same IP address) and same Facebook account, everything is working fine in the browser. I have also tried updating cookies, and that worked a week or so ago, but it hasn't worked since.

I've noticed that Facebook has again changed the order in which they present reactors in the desktop web browser interface, so they have been rolling out updates.

Given these things, do you have any suggestions about what I could try in order to be able to resume scraping again with facebook-scraper?

SerafinGranados commented 2 years ago

This is the same solved issue, check it out: #858

Enuratique commented 2 years ago

This is the same solved issue, check it out: #858

Yep. Version 0.2.59 fixes the issue for anyone else following this issue. Thanks for bringing to my attention