probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
618 stars 105 forks source link

[FBref] team_match_stats for teams with slash "/" in the name results in FileNotFoundError #739

Open ilyacherevkov opened 5 days ago

ilyacherevkov commented 5 days ago

Describe the bug Unable to use team_match_stats for teams with slash in the name, like Bodø/Glimt.

It tries to create file matchlogs_Bodø/Glimt_2022_schedule.html, which resolves incorrectly due to slash in the name.

Affected scrapers This affects the following scrapers:

Code example A minimal code example that fails. Use no_cache=True to make sure an invalid cached file does not cause the bug and make sure you have the latest version of soccerdata installed.

import soccerdata as sd
fbref = sd.FBref(leagues="SWE-Allsvenskan", seasons=[2022,2023], no_cache=True)
fbref.read_team_match_stats(stat_type="schedule", opponent_stats=False, team="Bodø/Glimt", force_cache=True)

Error message

Error while scraping https://fbref.com/en/squads/d86248bd/2022/matchlogs/all_comps/schedule. Retrying... (attempt 2 of 5).                  _common.py:568│
│                             Traceback (most recent call last):                                                                                                                        │
│                               File "/Users/user/.venv/lib/python3.12/site-packages/soccerdata/_common.py", line 564, in _download_and_save                                   │
│                                 with filepath.open(mode="wb") as fh:                                                                                                                  │
│                                      ^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                         │
│                               File "/usr/local/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 1013, in open                     │
│                                 return io.open(self, mode, buffering, encoding, errors, newline)                                                                                      │
│                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                      │
│                             FileNotFoundError: [Errno 2] No such file or directory:                                                                                                   │
│                             '/Users/user/soccerdata/data/FBref/historic/matchlogs_Bodø/Glimt_2022_schedule.html'

Additional context Note, line number in _common.py with the error might differ, as I did minor changes in the code.

Contributor Action Plan

ilyacherevkov commented 5 days ago

Fixed it by changing in fbref.py filepath = self.data_dir / filemask.format(team, skey, stat_type) to filepath = self.data_dir / filemask.format(team.replace('/',''), skey, stat_type)

Not sure if it breaks anything, though.

probberechts commented 5 days ago

No, it won't break anything. A more generic solution would be to use something like Django's slugify() function.