vaastav / Fantasy-Premier-League

Creates a .csv file of all players in the English Player League with their respective team and total fantasy points
Other
1.35k stars 785 forks source link

Fixtures.csv files have team_a and team_h swapped! #174

Closed danielfrees closed 7 months ago

danielfrees commented 7 months ago

I'm working on building out a CNN and some other models using this FPL data, and was feature engineering a 'matchup difficulty' feature and seeing some weird behavior. Tracing things backwards, I realized all the fixtures.csv files have team_a and team_h backwards! team_h and team_a_difficulty match up, but not V/V

vaastav commented 7 months ago

Hmm, I don't think that's true, at least for the current season. I just checked the latest fixtures.csv and it seems to have it correct. Can you point me to a specific, detailed example where the team_a and team_h have been swapped?

danielfrees commented 7 months ago

Sure, I'm seeing the issue in 2020-21, and 2021-22 (which are the years I'm using for my NN)

danielfrees commented 7 months ago

Maybe I'm misunderstanding something?

I'm seeing the same thing in 2023-23, where if I sortby 'team_a', the 'team_a_difficulties' are all jumbled, but the team_h_difficulties match up as I would expect. ie. if team_a is Man City, we see team_h_difficulty = 5 consistently, but team_a_difficulty is a bunch of different values.

vaastav commented 7 months ago

team_a_difficulty should be a bunch of different values as for Man City the fixture rating will heavily depend on the team they are facing. For example, Man City's difficulty rating for Liverpool Away would be 5; but for something like Luton Away it would most likely be 2.

danielfrees commented 7 months ago

Totally agree with that, but the team_a_difficulty should match up with the team_a ID value right? This is what I was trying to describe:

If we sort by the team_a ID (so this is a bunch of games where away team = team_a), then the team_h_difficulty becomes fixed, but the team_a_difficulties vary.

Screenshot 2023-12-06 at 9 21 45 AM

Same thing for sorting by the home team:

Screenshot 2023-12-06 at 9 24 19 AM
danielfrees commented 7 months ago

Wait a second... is the intention of these columns that 'team_h_difficulty' tells you how difficult the opponent team is for team h (ie. team_h_difficulty tells you how hard the AWAY team (team_a) is today for the home team?).

I was expecting the opposite from the column titles (ie. team_a_difficulty) tells you how difficult team_a is.

vaastav commented 7 months ago

team_h_difficulty is telling you the fixture difficulty rating for the home team (i.e. how difficult the away team is for the home team).

team_a_difficulty is telling you the fixture difficulty rating for the away team (i.e. how difficulty the home team is for the away team).

This encoding is directly from the official FPL API.

danielfrees commented 7 months ago

Got it, all good then. Thanks for clarifying. Might be worth adding a small note in the README?