w3c / opentrack-cg

Repository for OpenTrack Community Group
https://w3c.github.io/opentrack-cg/spec/competition/
Other
27 stars 10 forks source link

How to remove/merge duplicates in the database (athletes/clubs) #32

Open espinr opened 3 years ago

espinr commented 3 years ago

During the monthly call on Feb 15, Andy R raised an open question about the best way for cleaning the database, and avoiding duplicates when inserting new entries. In concrete with competitors, athletes and clubs.

So, please list best practices and ways to detect exact and near duplicates.

For instance: Reversing dates (month, day), name/surname, etc.

dbonacci commented 3 years ago

Maybe it would be useful to use the WA athletes' codes for all the athletes that have them? I know that the coverage is quite high, as I myself have this code even though in my best days I was far from an elite level athlete, and even more so in the last couple of years since the WA athletes database has been in production.