k-j-m / CourseAnalysis

Analysis of fell race routes based on the data available through fellrunner.org.uk
1 stars 2 forks source link

Use club or age to distinguish between ambiguous names #6

Closed k-j-m closed 7 years ago

k-j-m commented 7 years ago

Some names are common. This will hurt the learning algorithm which tries to fit a single parameter to a name that is shared by different runners.

k-j-m commented 7 years ago

How to figure out if this is an issue or not? Write a script to search through all sets of results and build a table of runner name -> a set of clubs used by that name. Sort on number of clubs and print out the top few common names to get a feel for the problem.

k-j-m commented 7 years ago

INTERESTING: can't go the club route at the moment because the club names would need normalising first. Without normalising the club names we will do more harm with false positives: mistakenly splitting one person in to many, rather than combining many people in to one.

This could be an interesting learning exercise, but I'm not sure how much benefit it would give.