Closed cboettig closed 5 years ago
That file has mostly specific epithets within Geospiza, and two genera outside of Geospiza (AFAIK) - Pinaroloxias, and Platyspiza
Ideally, before searching, we'd have the fullest name possible, given the data, e.g., Geospiza magnirostris instead of just magnirostris - It doesn't look like the name Geospiza is anywhere in that file though that I can see. other than the file name.
Ranks would be nice to have to make searches faster, but then we'd need the user to specify that in their nex file
We can do searches for higher taxonomic names, but epithets themselves usually don't work out to well. Not sure what to do in that case.
Thanks! right, looks like the data just isn't precise enough in this case then.
Um, more generally, do higher taxonomic names work? In similar vein, wondering if we should modify the function to return the two additional meta blocks like what @rvosa 's tool does here, specifying whether the name is a Species or some other rank, and specifying what the species is a rdfs:subClassOf
.
@rvosa the value of knowing taxonRank is pretty intuitive, but what's a use case where you would also want the subClassOf
? I suppose a user could always determine both of these pieces of information from the taxon identifier directly, though I could see that it would often be more convenient to avoid having to make another query.
Higher taxonomic names should work, yes.
Searching for a higher taxonomic name should return rank as well. Getting parent might be another request though.
(res <- get_uid("Platyspiza"))
#> [1] "48887"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/48887"
classification(res)
#> $`48887`
#> name rank id
#> 1 cellular organisms no rank 131567
...
#> 28 Passeriformes order 9126
#> 29 Passeroidea superfamily 175121
#> 30 Fringillidae family 9133
#> 31 Emberizinae subfamily 62155
#> 32 Platyspiza genus 48887
#>
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "ncbi"
# for the family Fringillidae, assuming that's what you want
as.uid(9133)
#> [1] "9133"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"uri")
#> [1] "http://www.ncbi.nlm.nih.gov/taxonomy/9133"
So we can get the info that way, presumably some smarter version of it though.
I realized just now that get.uid()
and related functions could return rank (if available), meaning one more piece of data avail. (meaning one less API call for users that want rank info).
Hey @sckott ,
taxize_nexml()
does a nice job of getting metadata when the labels are good species names. Would it be possible to extend this to handle names that are higher-order taxonomy? e.g. this nexml file gives otu labels as families I think: https://github.com/ropensci/RNeXML/blob/master/inst/examples/geospiza.xmle.g.