k-j-m / CourseAnalysis

Analysis of fell race routes based on the data available through fellrunner.org.uk
1 stars 2 forks source link

Improve race name normalisation and/or partial matching #2

Closed k-j-m closed 7 years ago

k-j-m commented 7 years ago

Race results should be able to be matched to the corresponding race information despite slight differences in the race name that is used in either dataset so that all results can be matched against races. When building an index of race results to race information, I've found that ~50% of race results can't be matched.

k-j-m commented 7 years ago

We can attack this in a number of ways:

k-j-m commented 7 years ago

For the 'Shutlingsloe & junior champs' example something smarter would be needed. If there are sufficiently few of these leftovers then we can just do a manual job of adding the race id + result id to the index file.

k-j-m commented 7 years ago

Crumbs, I've only managed to improve from 58% to 60% hit rate. Time to look at for some more specific examples to see what the most common problems are.

k-j-m commented 7 years ago

Got it up to 65% - most of the remaining mismatches seem to be either because the race info doesn't exist, or because we have results for different classification (not worth trying to use).