traitecoevo / APCalign

R package for accessing, matching and updating species names of Australian flora
https://traitecoevo.github.io/APCalign/
Other
4 stars 6 forks source link

Switch adist to stringdist #216

Closed ehwenk closed 6 months ago

ehwenk commented 6 months ago

Switching from util:adist to stringdist:stringdist for matching. This is both much faster and allows us to use a more nuanced matching algorithm by implementing the Damerau–Levenshtein distance method, and prioritising types of string changes (based on their algorithm).

ehwenk commented 6 months ago

@dfalster I've run all 47,000 AusTraits names through this and there were 33 that were different - it seems they are all instances of names that were passed over during fuzzy matching (match 5's) previously and now are being caught. So some additional matching power, but nothing being misaligned.