ropensci / refsplitr

R package for processing, organizing, and visualizing reference records downloaded from the Web of Science.
https://docs.ropensci.org/refsplitr
Other
55 stars 6 forks source link

Improved matching by address #58

Open tilltnet opened 5 years ago

tilltnet commented 5 years ago

The first part of the address matching test should be: !is.na(name.df$address) (! was missing). Also instead of going for an exact match in the second part of the test I suggest using the Jarowinkler distance with a high similarity threshold in order to match up addresses, that differ only in details, maybe a value higher than 0.9 is advisable.