ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
264 stars 58 forks source link

taxize::classification() output list of possible IDs as an object #906

Open EmilyMarkowitz-NOAA opened 1 year ago

EmilyMarkowitz-NOAA commented 1 year ago

Feature request: I have a list of scientific names that I need to translate into ITIS (and other database) codes. Most of the time, there is only one choice exported from the taxize::classification() function, which is great for quickly and efficiently identifying the appropriate ITIS code for the scientific name. However, when taxize::classification() function returns more than one option (as in the examples below), the user must make a choice from a list that is printed in the console for reference. While this is a great feature, it could be further improved by adding an object = TRUE or FALSE (or pick a better name) feature in the function that would allow the user to obtain the returned list of possible species IDs as an object that the user could use beyond the function. If the list itself could be pulled, the user may be able to use that list to further automatically identify to the correct ID number.

Example 1: In this case, instead of selecting 1 (ITIS ID = 51938), the user would use the output possible ID list to filter for the 1) valid entry with 2) the common names my data recognized, like "coral", thereby further automating how the ID number is identified.

This code (taxize::classification(sci_id = "Anthozoa", db = "itis")) would return this output:

image

Example 2: Here, instead of selecting 2 (ITIS id = 159038), the user would use the output possible ID list to identify 1) the valid entry and 2) the entry that was only one word long/a genus. It is useful that species within the genus are listed in the output, but in my example, I specifically want the ID for just the genus.

This code (taxize::classification(sci_id = "Aplidium", db = "itis")) would return this output:

image

Thanks for taking this suggestion under advisement! I am hoping a possible solution is as simple as adding a function variable that doesn't call for a user to make a selection and allowing the user to print the list to an object.

System information: R.version.string [1] "R version 4.2.1 (2022-06-23 ucrt)" taxize version 0.9.100 Windows

salix-d commented 1 year ago

The classification function use taxize::get_tsn and that one use ritis::terms. ritis::terms("Anthozoa") will return the output you want. For gbif it's taxize:::gbif_name_backbone("Aplidium"). For ncbi, the taxoze::get_uid function does the API call directly, so the option would need to be added in. Didn't check the other dbs.

zachary-foster commented 1 year ago

Thanks for the idea @EmilyMarkowitz-NOAA and thanks for pointing that out @salix-d. Yes, some other taxize functions and functions from other packages output all the information without asking for user input. I would like to make a generic function that does this as well but I havent gotten around to it. I agree that asking the user for input is not ideal in many contexts.