tr325 / FPL-Analysis

0 stars 0 forks source link

What Schema should the PSQL have? #1

Open tr325 opened 7 years ago

jg2503 commented 7 years ago

Firstly we will need a player-gameweek level table that describes all of the scoring and in game elements. This table will certainly contain all of the fpl point scoring metrics (including hidden bonus point contributors) which we will have to get from fpl. The table should also have as many general football stats as possible (passes, tackles, interceptions etc). Fpl has some of this data, but ideally we would find another more detailed source.

Secondly we will need a table at player-day level that summarizes all the real time factors that may contribute to the decision to transfer a player (price, injury flag, injury news etc).

Finally we will need a number of lookup or fact tables that I haven't fully wrapped my head around. Off the top of my head we will need to lay out all of the information about the season not yet played that is available to an fpl player (fixture list, which games fall in which GW, wildcard periods, probably a whole season event for the other boosts). In fact that may all fit in an event table.

With all of that the only thing I can think of that we wouldn't have the data for is how to pick a team at the beginning of the season. I think that's probably acceptable.

Thoughts?

tr325 commented 7 years ago

Firstly we will need a player-gameweek level table that describes all of the scoring and in game elements.

I think this should be player-game level with a gameweek column, to handle double gameweeks.

Player-day level I agree with. This should also have a "days until transfer deadline" column to inform timing of transfer decisions.

Also agree with the event table, that sounds sensible. Why do you say the "games in which gameweek" should fall here, not in the player-gameweek level table?

jg2503 commented 7 years ago

You're right the table should be player-game so that game level performance stats are consistant. However any predictive analysis will need to know how many games fall in future gameweeks which won't yet exist in the player-gameweek table so we will need that info in both places.

tr325 commented 7 years ago

Yep, that sounds reasonable. I'll have a go at getting something approximating that working with a static dataset.

tr325 commented 7 years ago

Looking into the data there's also some overview data that doesn't really fit into the "player-gameweek" table schema. Stuff like: "in_dreamteam": false, "influence": "0.0", "penalties_missed": 0, "penalties_saved": 0, "points_per_game": "0.0", "selected_by_percent": "0.2", "total_points": 0, "transfers_in": 1355, "transfers_in_event": 22, "transfers_out": 4767, "yellow_cards": 0

These basically seem to be aggregation-type stats about data over the course of the year. I don't think this fits into the "events" table either, as they're derived data.

jg2503 commented 7 years ago

Are you sure those are all aggregated variables? "penalties_missed": 0, "penalties_saved": 0, "transfers_in": 1355, -- This may even be a variable that we need to pull through daily "transfers_in_event": 22, -- This may even be a variable that we need to pull through daily "transfers_out": 4767, -- This may even be a variable that we need to pull through daily "yellow_cards": 0 could all theoretically be gameweek level variables. Anything that is just a straightforward aggregate is clutter and is not needed.

We are possibly interested in keeping influence, form and other variables with hidden formulas just to try to back calculate them and see what fpl considers "influence" to be.

jg2503 commented 7 years ago

Also we should try to replicate www.fplstatistics.co.uk in their prediction of price changes. Their predictions are very accurate and have step by step guides.

tr325 commented 7 years ago

Are you sure those are all aggregated variables?

Yeah - I looked down a few GWs, and they monotonically increase over time.

"transfers_in_event": 22, -- This may even be a variable that we need to pull through daily

'event' in their data means GW, so that's the net transfers this GW.

Also we should try to replicate www.fplstatistics.co.uk in their prediction of price changes. Their predictions are very accurate and have step by step guides.

One for me pls? :)