probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
516 stars 88 forks source link

[FBref] Optimize reading team match history by creating and reading from csv cache #533

Closed Kalaweksh closed 3 months ago

Kalaweksh commented 3 months ago

Resolves

Changes

Issues

probberechts commented 3 months ago

Thank you for taking the time to submit this pull request. I really appreciate your effort for improving the project. Yet, I do not believe that this is a good solution. It is too ad hoc (I prefer a general solution for all FBref endpoints), will make things too complicated and does not really fit in the scope of soccerdata.

I see that storing the full HTML page is inefficient, but I am not a huge fan of caching the preprocessed data. Soccerdata is meant to be able to download and parse the data. It is (at least for now) not intended as a database system. How to store the data and make it quickly accessible is out of scope.

Rather, I see value in compacting the HTML page before caching it. For example by only keeping the tables.