mawa00006 / Doping-Detection-Based-on-Publicly-Available-Competition-Data-in-Professional-Road-Cycling

0 stars 0 forks source link

Visualize SPS data #14

Closed mawa00006 closed 1 year ago

mawa00006 commented 2 years ago
SimonP07 commented 2 years ago

I am going to make a dataframe for every rider (10 Sinner, 10 Saints). I think that would be more clear. But I can also put the 10 of each category in one data frame. What is your suggestion? I will only look on the year and the rank they achieved. (If that's not right please tell me)

mawa00006 commented 2 years ago

I would suggest that we do something similar to the plots that where shown in the presentation from last years group. We have three categories (points, wins, racedays) for each season. Let's do one plot for each of them. Maybe we can plot something like 'mean points/wins/racedays' per season for all doped and undoped riders into one plot. First datapoint in the plot is the average of the first season for all rider independent of the year, second datapoint is the second year,...) we could see if there is a trend like 'longer careers', 'more/less races per year', or other pattern.

I am going to make a dataframe for every rider (10 Sinner, 10 Saints). I think that would be more clear. But I can also put the 10 of each category in one data frame. What is your suggestion? I will only look on the year and the rank they achieved. (If that's not right please tell me)

SimonP07 commented 2 years ago

That would mean we will make no differences between sinners and saints here, right? My question to the calculation of the means: Should we look at each year and take the avg. points in year 2012 as a plotpoint then the avg. points in year 2013 as the next plotpoint and so on? ( Same for the wins per year and racedays per year) And then plot them all into one dataframe?

mawa00006 commented 2 years ago

That would mean we will make no differences between sinners and saints here, right? My question to the calculation of the means: Should we look at each year and take the avg. points in year 2012 as a plotpoint then the avg. points in year 2013 as the next plotpoint and so on? ( Same for the wins per year and racedays per year) And then plot them all into one dataframe?

What I meant was not to to take the mean each year but for the career duration. the x-axis would be years competed. For example when plotting wins/per season we do not look at each year (e.g. 2000, 2001,...) but take the average over the first career year of each rider ( for one rider this might be 2000, for another 2014). We can then plot one line for sinner and one line for non sinner into one plot so we can compare both curves.