ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
269 stars 61 forks source link

Leipzig Catalogue of Vascular plants as additional reference for plant synonymy. #815

Open azizka opened 4 years ago

azizka commented 4 years ago

Hi Scott,

There is an competing reference list for plant synonymy, which I helped some colleagues to put together as an R package (https://github.com/idiv-biodiversity/LCVP) updating the plant list and expanding it by c. 180,000 taxa names.

It would be great if the list could be included into taxize as a data source. Do you see any chance for that? I am happy to help with the implementation.

Cheers,

Alex

sckott commented 4 years ago

hi - thanks for your message @azizka

Can you be more specific? Do you envision taxize importing LCVP or lcvplants? Or do you intend that the data itself from LCVP will be in this package? And what do you think is the intended use? Do you want to do here what your https://github.com/idiv-biodiversity/lcvplants package does? If you envision LCVP or lcvplants being an Import here, they would need to be on CRAN first. Another note is that the data in LCVP i think is too large to include in a package that is on CRAN; i'd imagine you'd need to host the data somewhere (e.g., a github repo) and have users download it before using it

azizka commented 4 years ago

Hi,

Thanks for the quick reply!

Yes, we are thinking taxize to import LCVP and use it as datasource to do the synonym resolution for plants (so basically replace what lcvplants is doing now).

We have hassled with the package size for quite a while, but not found a good solution. When you say hosting it in a github repo, do you mean so that people would need to download it manually and save it locally?

sckott commented 4 years ago

we are thinking taxize to import LCVP and use it as datasource to do the synonym resolution for plants (so basically replace what lcvplants is doing now).

that would work. Ideally we can integrate LCVP into the high level functions https://docs.ropensci.org/taxize/articles/infotable.html

For the data: Github was just an example. I'd try to make sure the data is hosted somewhere that's reliable and always available; avoid university websites, personal websites, etc. - So github, zenodo, gitlab, etc. If the data is hosted online, you could have a function that the user has to run first that downloads the data, and caches it on disk, and subsequent uses won't require the download step.