ropensci / refsplitr

R package for processing, organizing, and visualizing reference records downloaded from the Web of Science.
https://docs.ropensci.org/refsplitr
Other
55 stars 6 forks source link

Hierarchy of disambiguation information #56

Open tilltnet opened 5 years ago

tilltnet commented 5 years ago

Hi, I've noticed that the pruning part of the authors_match function separates entries that were formerly matched by the same ORCID, Researcher ID or E-Mail address. In my case that would lead to "unnecessary" under-matching. My quick fix for that was to set the similarity for those entries that were matched by ORCID and RID to 1, which would then exclude them from the pruning. For entries matched by E-Mail addresses the pruning seemed to do a good job though!

I don't know if giving ORCID and RID a higher priority is a universally better solution to the problem, but the way I understand this, ORCID and RID are quite reliable and might also identify a person that had their name changed due to marriage etc. Therefore the pruning those matches by name initials might not be the best solution.

If there are good reasons to overrule ORCID/ RID matches by name initial differences, it I might be worthwhile to consider letting the user decide the hierarchy ORCID/ RID, Email, and names.

Best, Till