probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
511 stars 87 forks source link

[FBRef] read_player_season_stats includes Women's World Cup by default (season 2023) #576

Closed mvantschip closed 1 month ago

mvantschip commented 1 month ago

I am fetching player data for the 2023 season, which by default, according to the docs, should only return data from the top 5 leagues. However, I noticed that stats from the Women's World Cup are included as well. I can reproduce this issue with the following code:

import soccerdata as sd
import pandas as pd

fbref = sd.FBref(seasons=2023)
stats = fbref.read_player_season_stats(stat_type='standard')
print(stats.index.unique(level='league'))

Output:

Index(['ENG-Premier League', 'ESP-La Liga', 'FRA-Ligue 1', 'GER-Bundesliga',
       'INT-Women's World Cup', 'ITA-Serie A'],
      dtype='object', name='league')`

In addition, I get a dataframe where each row occurs twice, but I am not sure if that problem is related. See, from the same code, the output of stats.head():

import soccerdata as sd
import pandas as pd

fbref = sd.FBref(seasons=2023)
stats = fbref.read_player_season_stats(stat_type='standard')
print(stats.head())

Output:

                                                 nation pos     age  born Playing Time                    Performance                                 Expected                      Progression           Per 90 Minutes
                                                                                    MP Starts   Min   90s         Gls Ast G+A G-PK PK PKatt CrdY CrdR       xG  npxG   xAG npxG+xAG        PrgC PrgP PrgR            Gls   Ast   G+A  G-PK G+A-PK    xG   xAG xG+xAG  npxG npxG+xAG
league             season team    player
ENG-Premier League 2324   Arsenal Aaron Ramsdale    ENG  GK  25-364  1998            6      6   540   6.0           0   0   0    0  0     0    0    0      0.0   0.0   0.0      0.0           0    2    0            0.0   0.0   0.0   0.0    0.0   0.0   0.0    0.0   0.0      0.0
                                  Aaron Ramsdale    ENG  GK  25-364  1998            6      6   540   6.0           0   0   0    0  0     0    0    0      0.0   0.0   0.0      0.0           0    2    0            0.0   0.0   0.0   0.0    0.0   0.0   0.0    0.0   0.0      0.0
                                  Ben White         ENG  DF  26-217  1997           35     33  2830  31.4           4   4   8    4  0     0    8    0      1.1   1.1   3.5      4.6          41  175  153           0.13  0.13  0.25  0.13   0.25  0.04  0.11   0.15  0.04     0.15
                                  Ben White         ENG  DF  26-217  1997           35     33  2830  31.4           4   4   8    4  0     0    8    0      1.1   1.1   3.5      4.6          41  175  153           0.13  0.13  0.25  0.13   0.25  0.04  0.11   0.15  0.04     0.15
                                  Bukayo Saka       ENG  FW  22-250  2001           34     34  2838  31.5          16   9  25   10  6     6    3    0     15.1  10.4  10.2     20.6         153  122  502           0.51  0.29  0.79  0.32    0.6  0.48  0.32    0.8  0.33     0.65

Thanks for the wonderful work!

probberechts commented 1 month ago

The docs are outdated. When no leagues are given, it returns the data for all the supported leagues. Previously, only the Big 5 leagues were supported but I've added support for the World Cups and Euros since.

mvantschip commented 1 month ago

I see! Thanks. Any idea about the duplicate rows? Or should I make a separate issue for that?