Closed koenklomps closed 2 years ago
Removing the "user-agent" header seems to fix it. You can remove the following line:
However, I do not understand why this causes trouble.
I tried deleting that line, but it still didn't work. However, after messing around a little bit more it started working, even with the user-agent line included. Seems to randomly work sometimes, but it other times it throws a 403 or 429 error.
One potential cause of the issue is the new bot scrapping rules for FbRef. They've started to ban anyone scrapping the website at a rate faster than 1 request per 3 seconds.
If you look into the _common.py code, you can see rate limit and max delay parameters are set to 0 and are currently inaccessible.
Indeed, you get a "429 Client Error: Too Many Requests for URL" error if you scrape too fast. Originally the rate limit was set to 1 request per 2 seconds, but it seems they've changed that now to 1 request per 3 seconds. This is actually implemented in fbref.py
which overrides the default of "no rate limiting" in _common.py
.
The 403 error is a different issue and I am still convinced that it is caused by the user agent headers. I'll create a pull request in a few minutes and it would be great if you could check whether that solves your issues.
Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred.
Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred.
About which error are you talking now? The 403 or 429 error?
Did you try removing the user agent headers?
So the code works now. The quick update above was from me fiddling with the code. I just noticed your hotfix, tried it, and It works fine now. Sorry for the confusion.
No problem. Thanks for checking!
Should be fixed in v1.0.2 🚀
Which Python version are you using?
Which version of soccerdata are you using?
What did you do?
What did you expect to see?
What did you see instead?