matildabrown / rWCVP

Generating Summaries, Reports and Plots from the World Checklist of Vascular Plants
https://matildabrown.github.io/rWCVP/
GNU General Public License v3.0
19 stars 0 forks source link

Fuzzy matching 20k names blows up memory #27

Closed matildabrown closed 2 years ago

matildabrown commented 2 years ago

Not sure if we need to automatically chunk these... or if there's another way around it

── Fuzzy matching 20037 names ──

Error in `mutate()`:
! Problem while computing `match_edit_distance = diag(adist(.data$sanitised_,
  .data$taxon_name))`.
Caused by error:
! cannot allocate vector of size 117.4 Gb
Backtrace:
  1. rWCVP::match_names(...)
 13. utils::adist(.data$sanitised_, .data$taxon_name)