probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
529 stars 90 forks source link

[WhoScored] The current season's schedule is not cached #169

Closed guilherme-95 closed 1 year ago

guilherme-95 commented 1 year ago

I'm facing the following issue when trying to scrape game data for the 22-23 season - when I ask ws.read_events to return data from a list of game IDs in the 21-22 season, it will scrape the schedule once and move on to getting the game data. If I do it for the 22-23 season, it starts scraping the schedule for every game_id I want data from.

It doesn't matter if I point to the schedule file in /soccerdata/data/WhoScored/matches or if I build a new file using ws.read_schedule

The code I'm running is below:

import pandas as pd
import soccerdata as sd

ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23")

match_ids_df = pd.read_csv("premier_league_schedule.csv")

events = ws.read_events(match_id=match_id)
events = events.fillna(0)

filename = f"match_data_{match_id}.csv"
events.to_csv(filename, index=False)

I apologize if this is an error on my end, I'm not very experienced with python.

probberechts commented 1 year ago

This is supposed to be a feature. By default, the scraper assumes that the cache is outdated for the current season. If you are sure that the cache is up to date, you can force the scraper to use the cached schedule by setting the force_cache parameter to true.

import soccerdata as sd

ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23")
events = ws.read_events(match_id=..., force_cache=True)
guilherme-95 commented 1 year ago

thank you very much for the information

guilherme-95 commented 1 year ago

Sorry, forgot to close the issue.