probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
609 stars 104 forks source link

FBRef - Returning stats for 1923/24 season not 2023/24 season #726

Open philbywalsh opened 1 week ago

philbywalsh commented 1 week ago

Describe the bug I'm attempting to scrape FBRef for Premier League stats for the 2023/24 season. I've tried various values for 'season' and all seem to return stats for 1923/24 season not the 2023/24 season.

Affected scrapers This affects the following scrapers:

Code example

import soccerdata as sd

seasons = ['2023-24']

def fetch_match_IDs(seasons):
    """Fetches match IDs for a given list of seasons.

    Args:
        seasons (list): A list of seasons.

    Returns:
        pd.DataFrame: A DataFrame containing match IDs.
    """

    match_IDs = pd.DataFrame(columns=['season', 'week', 'home_team', 'away_team', 'score', 'game_id'])

    for season in seasons:
        fbref = sd.FBref(leagues="ENG-Premier League", seasons={season}, no_cache=True)
        print(f'Attempting to scrape season: {season}')
        schedule = fbref.read_schedule()
        schedule = schedule.reset_index(drop=True)

        # Create a new column 'season' and assign the current season value
        schedule['season'] = season

        match_data = schedule[['season', 'week', 'home_team', 'away_team', 'score', 'game_id']]
        match_IDs = pd.concat([match_IDs, match_data], ignore_index=True)

        time.sleep(random.randint(2, 6))
        #time.sleep(10)  # Wait for 10 seconds

    time.sleep(random.randint(2, 6))
    #time.sleep(10)  # Wait for 10 seconds

    return match_IDs

match_IDs = fetch_match_IDs(seasons)

Error message

No error message. But incorrect data is returned

Additional context Oddly enough, the '2023-2024' parameter seemed to work fine during extensive testing yesterdat.

Contributor Action Plan

philbywalsh commented 1 week ago

Bizarrely I re-ran this code this morning (same jupyter notebook, which remained open overnight) with seasons = ['2023-24'] and it now works as desired - i.e. brings back 2023/24 data not 1923/24 data.

Feels like this is an intermittent bug as, over the last 36 hours, the same code returned

2023/24 then 1923/24 when back to 2023/24