idigbio-api-hackathon / dedup

Specimen dedup code
MIT License
0 stars 0 forks source link

Prior art #10

Open mjcollin opened 9 years ago

mjcollin commented 9 years ago

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=looking+for+duplicates+in+gbif&start=10

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3677402/ http://update.specifysoftware.org/6400/relnotes.html (SGR) http://wiki.filteredpush.org/wiki/2013Oct09 http://dev.gbif.org/issues/browse/DM-231

Bouteloua commented 9 years ago

Just talked to the Filtered Push people. They're clustering collator names and collator numbers, and then letting the user clean it up after that point. They know if one of the fields is NaN this will create issues, but this is a quick an dirty "fix". Bob Morris is currently working on the data mining aspect...