ropensci / taxadb

:package: Taxonomic Database
https://docs.ropensci.org/taxadb
Other
43 stars 13 forks source link

filter_name for higher ranks doesn't work with ITIS #114

Closed atn38 closed 1 year ago

atn38 commented 1 year ago

Hi Carl,

For example, this works

filter_name("Quercus alba", provider = "itis")

and this works albeit slow

filter_name("Quercus", provider = "ncbi")

but this doesn't

filter_name("Quercus", provider = "itis")

The command runs, but the result data.frame does not have kingdom to species names:

image

I've tried itis, ncbi, col, and gbif. The other three works for higher ranks as far as I can tell, I've tried a couple genera and some other higher ranks. I didn't expect ITIS to not work, especially when it's the default provider. If this is an ITIS issue and not taxadb's, perhaps you can switch the default, or indicate that higher ranks do not work for the ITIS default. Btw, this has some weird consequences down the line for EML::set_taxonomicCoverage, which is where I first met the issue and started digging. Thanks!

cboettig commented 1 year ago

Thanks.

meanwhile, generally I find it easiest to work with the tables directly rather than the helper functions. This should be faster and also make it clearer what is going on. e.g.

taxa_tbl("itis") |> filter(genus == "Quercus")

I'm not at my machine right now but I think the ITIS snapshot may only have taxon IDs at the species level. (I don't think all providers actually define taxon IDs for all higher taxa, but we should be including them when they do so this is a bug). On the road but will try and investigate soon

atn38 commented 1 year ago

Thanks Carl for the suggestion. I came across this from noticing issues with EML::set_taxonomicCoverage which uses taxadb::filter_name under the hood. I'm actually not too familiar with taxadb itself. The query gives me:


> taxa_tbl("itis") |> filter(genus == "Quercus")Error in storage.mode(x) <- "double" :
  'list' object cannot be coerced to type 'double'

On Wed, Feb 22, 2023 at 9:32 PM Carl Boettiger @.***> wrote:

Thanks.

meanwhile, generally I find it easiest to work with the tables directly rather than the helper functions. This should be faster and also make it clearer what is going on. e.g.

taxa_tbl("itis") |> filter(genus == "Quercus")

I'm not at my machine right now but I think the ITIS snapshot may only have taxon IDs at the species level. (I don't think all providers actually define taxon IDs for all higher taxa, but we should be including them when they do so this is a bug). On the road but will try and investigate soon

— Reply to this email directly, view it on GitHub https://github.com/ropensci/taxadb/issues/114#issuecomment-1441176138, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKAZD5VFB4JL3HW3WFQYW73WY3K5ZANCNFSM6AAAAAAVE766HI . You are receiving this because you authored the thread.Message ID: @.***>

cboettig commented 1 year ago

@atn38 sorry somehow I missed your follow up. I should have mentioned you'd need library(dplyr) or use dplyr::filter().

cboettig commented 1 year ago

btw, this should also now be working in ITIS.

taxadb::get_ids("Quercus", "itis")
#> Joining with `by = join_by(scientificName)`
#> [1] "ITIS:19276"

Created on 2023-03-08 with reprex v2.0.2