probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
586 stars 101 forks source link

Club names and aliases not correctly mapped at teamname_replacements.json #703

Open MartiONE opened 1 week ago

MartiONE commented 1 week ago

Describe the bug The bug appears when you try to call read_team_history from ClubElo. The file that contains aliases for team names that might differ from the ClubElo website gets correctly loaded and stored inside the _config.py variable TEAMNAME_REPLACEMENTS. However, whenever the function read_team_history inside the ClubElo class tries to filter the names to process does it in the reverse way, the problematic line is this one

Affected scrapers This affects the following scrapers:

Code example Considering you have a minimal teamname_replacements.json like

{"Tottenham": ["Tottenham Hotspur", "Tottenham Hotspur FC", "Spurs"]}

and then you run

import soccerdata as sd
elo = sd.ClubElo()
elo.read_team_history(team="Spurs")

Error message

ValueError                                Traceback (most recent call last)
Cell In[4], [line 1](vscode-notebook-cell:?execution_count=4&line=1)
----> [1](vscode-notebook-cell:?execution_count=4&line=1) elo.read_team_history(team="Spurs")

File ~/jupyter-env/venv/lib/python3.11/site-packages/soccerdata/clubelo.py:179, in ClubElo.read_team_history(self, team, max_age)
    [176](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e136.vscode-resource.vscode-cdn.net/home/jupyter/jupyter-env/notebooks/~/jupyter-env/venv/lib/python3.11/site-packages/soccerdata/clubelo.py:176)         df.replace({"team": TEAMNAME_REPLACEMENTS}, inplace=True)
    [177](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e136.vscode-resource.vscode-cdn.net/home/jupyter/jupyter-env/notebooks/~/jupyter-env/venv/lib/python3.11/site-packages/soccerdata/clubelo.py:177)         return df
--> [179](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e136.vscode-resource.vscode-cdn.net/home/jupyter/jupyter-env/notebooks/~/jupyter-env/venv/lib/python3.11/site-packages/soccerdata/clubelo.py:179) raise ValueError(f"No data found for team {team}")

ValueError: No data found for team Spurs

Additional context The same scenario can be also happenning in more dedicated classes, I did not check in depth: Here, here and here Contributor Action Plan

This is a trivial change but an important one, I can also fix the tests or make them check the file. Also, I'd suggest we don't use one liners with variables like k and v as those tend to be hard to debug.

MartiONE commented 3 days ago

Hey @probberechts , am I in the clear to provide a fix for this? :)