probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
573 stars 101 forks source link

[FBref] Can't fetch schedule data #76

Closed BelkacemB closed 1 year ago

BelkacemB commented 2 years ago

if you run:

import soccerdata as sd
fbref = sd.FBref(leagues="ENG-Premier League", seasons=2021)
print(fbref.__doc__)

epl_schedule = fbref.read_schedule()

You will get an error

frame.py 3832 _set_item value = self._sanitize_column(value)

frame.py 4535 _sanitize_column com.require_length_match(value, self.index)

common.py 557 require_length_match raise ValueError(

ValueError: Length of values (0) does not match length of index (31)

MatsThijssen commented 2 years ago

This appears to still be an issue, with pretty much any fbref function. Haven't investigated much, but my best guess at the moment is further attempts by fbref to discourage webscraping. If I have time to dive deeper I will update here.

probberechts commented 2 years ago

It seems indeed related to FBRef blocking bot traffic. However, soccerdata respects their policy. Since it already fails on the first request, I guess they simply block all headless traffic when the load is very high.

Most of the time, everything works just fine though. I would recommend to simply wait a bit if it does not.

probberechts commented 1 year ago

@BelkacemB I just found out that your error might be caused by using cached data. Try disable caching with

import soccerdata as sd
fbref = sd.FBref(leagues="ENG-Premier League", seasons=2021, no_cache=True)
epl_schedule = fbref.read_schedule()

If that works just delete your cache (default is at ~/soccerdata/data/FBref) and scrape all data again.

FBRef recently renamed some HTML attributes. This was fixed by 1f4128bc6ef9a00fab921f2f70cfad64ecab54fb. Obviously, this now creates problems if you would run the latest version on cached data which still has the old HTML attributes.