ropensci / taxa

taxonomic classes for R
https://docs.ropensci.org/taxa
Other
48 stars 12 forks source link

lookup_tax_data: When looking up taxonomy of binomials, only look up genus #100

Open zachary-foster opened 7 years ago

zachary-foster commented 7 years ago

Often, there are many species in a data set sharing a genus. When looking up the taxonomy from taxon names, it is inefficient to use the full name in many cases since that taxonomy of many species can be inferred from a single query of the genus. Also, sometimes a genus is in a database while a species is not. However, this might cause a problem when one genus names is used for multiple genera.

> species_names
  [1] "Achlya hypogyna"                  "Albugo liabachii"                 "Aphanomyces astaci"               "Aphanomyces cladogamus"          
  [5] "Aphanomyces cochliodies"          "Aphanomyces euteiches"            "Aphanomyces frigidophilus"        "Aphanomyces invadans"            
  [9] "Aphanomyces laevis"               "Aphanomyces stellatus"            "Bremia"                           "Hyaloperonospora parasitica"     
 [13] "Peronosclerospora maydis"         "Peronosclerospora philippinensis" "Peronosclerospora sacchari"       "Peronosclerospora sorghi"        
 [17] "Peronospora belbahrii"            "Peronospora tabacina"             "Peronosppora destructor"          "Phytophthora alni"               
sckott commented 7 years ago

this might cause a problem when one genus names is used for multiple genera.

does this mean many genera in different higher taxonomic groups? e.g., the same genus name in plants and animals?

zachary-foster commented 7 years ago

Yea, like "Achlya", which is a moth and an oomycete.

sckott commented 7 years ago

possible to allow user to give a higher taxon group ?

zachary-foster commented 7 years ago

That would work well when the user is only looking at one Kingdom, which I expect is usually the case. I think that would have to be handled by taxize though, since if there are multiple matches, only one can be returned. If there was a variant on taxize::classification that returned the taxonomy of all matches, perhaps with an additional column of numbers to specify which match each rank belonged to, then that could be implemented in taxa.

sckott commented 7 years ago

If there was a variant on taxize::classification that returned the taxonomy of all matches, perhaps with an additional column of numbers to specify which match each rank belonged to

there is the concept in taxize::get_* functions to get all results when > 1 result - and not go through the prompt - but taxize::classification has no equivalent.

though perhaps this is close enough: use get_* to get any number of taxon IDs, then pass to classification, optionally bind classifications together

or does that not do what's needed for this?

zachary-foster commented 7 years ago

I think that would work. Could add an option to taxize::classification named all_matches = TRUE that returns that output format. Probably should always have that extra column if TRUE, even if only one match is returned so the format is consistent.

sckott commented 7 years ago

Which thing would work?

zachary-foster commented 7 years ago

there is the concept in taxize::get_* functions to get all results when > 1 result - and not go through the prompt - but taxize::classification has no equivalent.

Sorry for the vagueness. Looking at the code, it looks like taxize::get_* are called when a taxon name is supplied and they handle the prompting of the user. An argument could be added to classification that allows for multiple returns from taxize::get_* without a prompt, looks up the classification of each, and then rbinds them together with an extra column with the matching taxon ID.

sckott commented 7 years ago

Thanks for clarification.

Sounds good up to the point of rbinding together - that would mean a departure from the output format in other cases - and I'd rather not have variable output formats

opened an issue in taxize to explore there https://github.com/ropensci/taxize/issues/628

zachary-foster commented 7 years ago
  • and I'd rather not have variable output formats

Yea, variable output formats are not great