matildabrown / rWCVP

Generating Summaries, Reports and Plots from the World Checklist of Vascular Plants
https://matildabrown.github.io/rWCVP/
GNU General Public License v3.0
19 stars 0 forks source link

edit distance matching is very slow #21

Open barnabywalker opened 2 years ago

barnabywalker commented 2 years ago

It took ~1hr to match 608 names. Is there some way we could speed this up?

🤷

matildabrown commented 2 years ago

It's the pairwise distance matching that gets out of hand. It's already optimised to filter by genus first, so there's been some attempt made (it was way slower before that...) Vectorising it blows up the memory, but perhaps it could be chunked and vectorised? At least there's an ETA with the progress bar...