Closed spartanovo closed 1 year ago
FBRef uses a different layout for the 2017/18 and 2018/19 seasons. In the 2017/18 season, the "MP" column is a separate category. While in the 2018/19 season it is grouped under "Playing Time".
All you need is two lines of post-processing:
hold[("Playing Time", "MP")] = hold[("Playing Time", "MP")].fillna(hold["MP"])
hold.drop(columns=["MP"])
I'll add this to the codebase later.
Awesome. That fixed the problem. Thank you @probberechts!
Hello,
I have found a small bug when pulling data from FBRef.com. NaN values appearing in the MP columns in the data for stat_types standard and playing_time for players who have played in the season.
I found this problem after I wrote a function to obtain multiple stat_types for multiple seasons and converted the DataFrames from a multiindex to a standard pandas DataFrame. I found a large quantity of NaNs due to this transformation.
To troubleshoot, I did a single pull using the
.read_player_season_stats(stat_type = 'standard')
call on 2 seasons of data (1718 & 1819) and found NaN values in both theMP
andPlaying Time MP
columns. Players who played and did not play had received NaN values in the aforementioned columns. Under the "Playing Time" section's MP column, I found 890 NaN values and in the standalone 'MP' column, I found 380 NaN values. I am transitioning from R to Python and have always used the flattened-style DataFrame in the past.Attached is a csv file containing the aforementioned data.
Call:
I greatly appreciate your assistance. fbref_nan_bug_df.csv