probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
573 stars 101 forks source link

Add support for pandas 2.1.0 #372

Closed aegonwolf closed 11 months ago

aegonwolf commented 11 months ago

Hi there, is it possible that there is a bug with newer pandas versions? I didn't have it before but I also haven't use this awesome package for a few months:

Calling any fbrefobject with read_team_season_statsor player_season_stats yields:

AxisError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 season_stats = fbref.read_team_season_stats(stat_type='shooting')

File ~\anaconda3\envs\scraperfc3\lib\site-packages\soccerdata\fbref.py:288, in FBref.read_team_season_stats(self, stat_type, opponent_stats)
    285     stat_type += "_for"
    287 # get league IDs
--> 288 seasons = self.read_seasons()
    290 # collect teams
    291 teams = []

File ~\anaconda3\envs\scraperfc3\lib\site-packages\soccerdata\fbref.py:180, in FBref.read_seasons(self, split_up_big5)
    167 """Retrieve the selected seasons for the selected leagues.
    168 
    169 Parameters
   (...)
    177 pd.DataFrame
    178 """
    179 filemask = "seasons_{}.html"
--> 180 df_leagues = self.read_leagues(split_up_big5)
    182 seasons = []
    183 for lkey, league in df_leagues.iterrows():

File ~\anaconda3\envs\scraperfc3\lib\site-packages\soccerdata\fbref.py:147, in FBref.read_leagues(self, split_up_big5)
    144     df_table["url"] = html_table.xpath(".//th[@data-stat='league_name']/a/@href")
    145     dfs.append(df_table)
    146 df = (
--> 147     pd.concat(dfs)
    148     .pipe(standardize_colnames)
    149     .rename(columns={"competition_name": "league"})
    150     .pipe(self._translate_league)
    151     .drop_duplicates(subset="league")
    152     .set_index("league")
    153     .sort_index()
    154 )
    155 df["first_season"] = df["first_season"].apply(season_code)
    156 df["last_season"] = df["last_season"].apply(season_code)

File ~\anaconda3\envs\scraperfc3\lib\site-packages\pandas\core\reshape\concat.py:393, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    378     copy = False
    380 op = _Concatenator(
    381     objs,
    382     axis=axis,
   (...)
    390     sort=sort,
    391 )
--> 393 return op.get_result()

File ~\anaconda3\envs\scraperfc3\lib\site-packages\pandas\core\reshape\concat.py:680, in _Concatenator.get_result(self)
    676             indexers[ax] = obj_labels.get_indexer(new_labels)
    678     mgrs_indexers.append((obj._mgr, indexers))
--> 680 new_data = concatenate_managers(
    681     mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy
    682 )
    683 if not self.copy and not using_copy_on_write():
    684     new_data._consolidate_inplace()

File ~\anaconda3\envs\scraperfc3\lib\site-packages\pandas\core\internals\concat.py:180, in concatenate_managers(mgrs_indexers, axes, concat_axis, copy)
    177     values = np.concatenate(vals, axis=1)  # type: ignore[arg-type]
    178 elif is_1d_only_ea_dtype(blk.dtype):
    179     # TODO(EA2D): special-casing not needed with 2D EAs
--> 180     values = concat_compat(vals, axis=1, ea_compat_axis=True)
    181     values = ensure_block_shape(values, ndim=2)
    182 else:

File ~\anaconda3\envs\scraperfc3\lib\site-packages\pandas\core\dtypes\concat.py:135, in concat_compat(to_concat, axis, ea_compat_axis)
    133 else:
    134     to_concat_arrs = cast("Sequence[np.ndarray]", to_concat)
--> 135     result = np.concatenate(to_concat_arrs, axis=axis)
    137     if not any_ea and "b" in kinds and result.dtype.kind in "iuf":
    138         # GH#39817 cast to object instead of casting bools to numeric
    139         result = result.astype(object, copy=False)

AxisError: axis 1 is out of bounds for array of dimension 1

Should I revert pandas versions? If so, to which one?

aegonwolf commented 11 months ago

returned to pandas 2.0 and it worked, 2.1 throws the error again.

vishalmish commented 11 months ago

Seeing the same issue, tried Pandas both 2.0 & 2.1 and it didn't help.

probberechts commented 11 months ago

I know about this. Apparently, something changed between pandas v2.0.3 and v2.1.0 in the the concat function, but I did not figure out what exactly. You can downgrade to v2.0.3 as a temporary fix.