usgin / metadata-repository

Django application providing a user-interface for building a file and metadata management system. Operates in conjunction with https://github.com/usgin/metadata-server
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Duplicate contacts for people in harvest from repository.usgin #33

Open smrgeoinfo opened 12 years ago

smrgeoinfo commented 12 years ago

there are multiple contact entries for various people that came in with the repository.usgin harvest. For instance there's DS Love, Love, D, Love, D.S., Love, D.S. Arizona Geological Survey.

How can we deduplicate these so there's only one contact 'object' for each individual?

rclark commented 12 years ago

We have to ferret them out of the metadata. The contact list is built with whatever contacts are in use. Misspellings, different abbreviations, etc will always cause duplication.

Deduplication is hard to automate, and will probably require generating a list of contacts that we know are duplicated, then finding all the records using that contact and fixing them before moving to the next contact.

On Oct 5, 2012, at 10:29 AM, Stephen Richard notifications@github.com wrote:

there are multiple contact entries for various people that came in with the repository.usgin harvest. For instance there's DS Love, Love, D, Love, D.S., Love, D.S. Arizona Geological Survey.

How can we deduplicate these so there's only one contact 'object' for each individual?

— Reply to this email directly or view it on GitHub.