ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Replace taxize with taxald #226

Closed cboettig closed 5 years ago

cboettig commented 5 years ago

@hlapp can you review this?

This implements the changes proposed in #224. Unlike the original taxize version, this approach vectorizes the call for id lookup, and also supports multiple authorities instead of only NCBI. The advantages of this are particularly evident in the primates example in the metadata vignette -- which previously ran very slowly and resolved no names, no almost all names are resolved (against ITIS taxonomy in this case), and which runs quite quickly.

taxald pulls a local copy of the complete id table for the authority on the fly if one has not been installed locally. Technically this can be a bit slow, but as these are compressed tables (coming from AWS S3 at the moment) they download more quickly than some single API calls. A user can install a local copy, (e.g. by running taxald::td_create(authority = "all")), but for simplicity and consistency with the old version I haven't documented that here.

taxald isn't on CRAN yet and may change a bit still, though I think this basic behavior should not be impacted. So merging this would mean we would ideally want taxald on CRAN before the next RNeXML release; though as this has always been an opt-in or soft dependency, it's not essential.

This also includes and updated version of the pkgdown build based on taxald, so deprecates PR #223 (and resolves the travis-build error associated with taxize network errors).

hlapp commented 5 years ago

Very nice speedup! Shaved about 40% off of the CI test time. 👍

cboettig commented 5 years ago

Thanks for the nice review. I've fixed the vignette (dropping taxize mention), append instead of overwriting meta, and added an argument to opt out of warnings. Also rebuilt docs. Merge when ready.