traitecoevo / APCalign

R package for accessing, matching and updating species names of Australian flora
https://traitecoevo.github.io/APCalign/
Other
4 stars 6 forks source link

Swap adist for stringdist #215

Closed dfalster closed 4 months ago

dfalster commented 4 months ago

As noted by @wcornwell in https://github.com/traitecoevo/APCalign/issues/180#issuecomment-2087805540, we can swap out adist for a faster alternative

wcornwell commented 4 months ago

@ehwenk Interesting bit from the stingdist help files

The metric you need to choose for an application strongly depends on both the nature of the string (what does the string represent?) and the cause of dissimilarities between the strings you are measuring. For example, if you are comparing human-typed names that may contain typo's, the Jaro-Winkler distance may be of use. If you are comparing names that were written down after hearing them, a phonetic distance may be a better choice.

ehwenk commented 4 months ago

@wcornwell @dfalster

So much for my hour of coding - immediately the same output with stringdist, method = "dl". At least now I know my logic has an official name.

I'll remove my "wordy" code and run all tests.

ehwenk commented 4 months ago

image

wcornwell commented 4 months ago

interestingly, this probably makes #162 obsolete. Looks like part of their C++ magic is low-level parallelization.

see: https://www.rdocumentation.org/packages/stringdist/versions/0.9.12/topics/stringdist-parallelization

ehwenk commented 4 months ago

closed with https://github.com/traitecoevo/APCalign/commit/a8c763231c46172c265cd66c7d22a614143e989a