Closed ehwenk closed 6 months ago
@dfalster I've run all 47,000 AusTraits names through this and there were 33 that were different - it seems they are all instances of names that were passed over during fuzzy matching (match 5's) previously and now are being caught. So some additional matching power, but nothing being misaligned.
Switching from
util:adist
tostringdist:stringdist
for matching. This is both much faster and allows us to use a more nuanced matching algorithm by implementing the Damerau–Levenshtein distance method, and prioritising types of string changes (based on their algorithm).