Analysis of fell race routes based on the data available through fellrunner.org.uk
The project is split in to 3 different parts
All data comes from fellrunner website, with a lot of manual cleanup.
More to be written here when I have some more time.
Imagine a robot-like fell runner who can bang out completely consistent race performances day after day after day. If he runs the same course 100 days in a row he will get 100 identical results. He is consistently good/bad over all distances and terrain. This imaginary standard runner is the benchmark that we are going to measure all courses and all other runners against.
For every course the standard runner has a finish time. We might not know what it is yet, but it does exist, call it t_course.
For every other competitor the standard runner will either be some amount faster or some amount slower, call it k_runner.
So if I am scored 10% slower than the standard runner then I will expect to finish 10% slower for every single race route.
If the number of races is M, and then number of unique runners is N then there are total of M + N model parameters that we need to fit to minimise the sum of the errors between the runners' predicted finish times and the actual finish times.
Where a runner's predicted finish time is t_runner = t_course * k_runner
Each race route has a couple of vital statistics: distance and climb. More climb means that the average pace will slow down, but by how much? More distance means that the average pace will slow down, again by how much?
The original intent behind all of this was to figure out how much each foot of climb costs compared with distance on the flat (eg 1000ft is the equivalent of adding an extra 1.5 miles to the course).
The whole gnarliness thing was a bit of fun to explain away the difference between the predicted race duration (based on distance and ascent) and the actual race duration (based on the t_course parameter that we fit in the first model).
There are a number of things that aren't done so well