Open nickojelly opened 3 years ago
Further on this, recommended to switch back to full form after generating new features
Note on use of split time in feature selection:
Split time is a measurement of the time it takes for a dog to cross the finish line 1 time, this happens at least 1 per race, can happen twice in longer races. If it happens twice first split is usually used.
As an example here is a map of the Shepperton race track with the different starting positions:
And here is Bendigo:
As the tracks all have different starting positions and track actual lengths (length of 1 circuit), split times should be normalized to their relevant split distance to ensure accurate prediction measures if used.
Some of the split distances are measured, including Bendigo, but the majority are not:
To normalize the split distances, the average time to cross a split for an unmeasured track will be compared against the average speed for a measured track, then the track distance can be implemented:
For example, the average split time for Bendigo is 425m, we know this is measured at a distance of 100m. The average split time for Horsham 410m is 10.67, so we clearly know this is not the same split distance
So we can estimate Horshams split to be 150-160m.
Later we can account for the average change in the speed for a dog over the duration for a race if we need a better estimate, but this will do for now.
New given split distances:
Cranbourne:
Sanddown: http://www.sandowngreyhounds.com.au/track-details-records/
Traralgon: https://www.grv.org.au/news/2021/08/30/traralgon-j-curve-fast-facts/
Using R create a model that determines what the useful features in the basic form are: