ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
270 stars 61 forks source link

taxize::downstream() result is length 0 error using WoRMS #847

Closed oharac closed 4 years ago

oharac commented 4 years ago

Hi,

I'm finding this package to be really useful, but I'm running into a bug. I am using taxize::downstream to access the WoRMS database to get all families related to a set of specific orders. For nearly everything, it works fine, but for decapoda (1130) and amphipoda (1135) it returns this error:

Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1) : 
  values must be length 1,
 but FUN(X[[53]]) result is length 0

EDIT: I see that this is similar to #821 and #824 - those were related to a problem with rank name - perhaps something similar happening here?

Reproducible example:

library(taxize)
x <- downstream(sci_id = 'decapoda', db = 'worms', downto = 'family', intermediate = FALSE)
x <- downstream(sci_id = 1130, db = 'worms', downto = 'family', intermediate = FALSE)
Session Info ```r R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.5 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] taxize_0.9.98.91 loaded via a namespace (and not attached): [1] Rcpp_1.0.5 ape_5.4-1 lattice_0.20-41 prettyunits_1.1.1 ps_1.3.3 [6] zoo_1.8-8 assertthat_0.2.1 rprojroot_1.3-2 digest_0.6.25 foreach_1.5.0 [11] R6_2.4.1 plyr_1.8.6 backports_1.1.8 RSQLite_2.2.0 pillar_1.4.6 [16] rlang_0.4.7 curl_4.3 uuid_0.1-4 rstudioapi_0.11 data.table_1.13.0 [21] callr_3.4.3 blob_1.2.1 worrms_0.4.2 desc_1.2.0 urltools_1.7.3 [26] devtools_2.3.0 stringr_1.4.0 bit_4.0.4 triebeard_0.3.0 compiler_3.6.3 [31] xfun_0.14 pkgconfig_2.0.3 pkgbuild_1.0.8 conditionz_0.1.0 tidyselect_1.1.0 [36] tibble_3.0.3 httpcode_0.3.0 codetools_0.2-16 reshape_0.8.8 fansi_0.4.1 [41] crayon_1.3.4 dplyr_1.0.2 hoardr_0.5.2 dbplyr_1.4.4 withr_2.2.0 [46] rappdirs_0.3.1 crul_1.0.0 grid_3.6.3 nlme_3.1-148 jsonlite_1.7.1 [51] lifecycle_0.2.0 DBI_1.1.0 magrittr_1.5 taxizedb_0.2.2.93 cli_2.0.2 [56] stringi_1.5.3 fs_1.4.1 remotes_2.1.1 testthat_2.3.2 xml2_1.3.2 [61] ellipsis_0.3.1 generics_0.0.2 vctrs_0.3.4 iterators_1.0.12 tools_3.6.3 [66] bold_1.1.0 bit64_4.0.5 glue_1.4.2 purrr_0.3.4 processx_3.4.2 [71] pkgload_1.1.0 parallel_3.6.3 sessioninfo_1.1.1 memoise_1.1.0 knitr_1.28 [76] usethis_1.6.1 ```
sckott commented 4 years ago

Thanks - taxonomy is a deep dark hole from which many weird taxonomic ranks emerge from time to time. one of the taxa had the rank "epifamily" https://www.marinespecies.org/aphia.php?p=taxdetails&id=1459303

fixed, if you reinstall it should work

oharac commented 4 years ago

awesome - thanks so much! If I run into another odd rank in the next round of searching I will post it here.

On Mon, Sep 28, 2020 at 4:28 PM Scott Chamberlain notifications@github.com wrote:

Thanks - taxonomy is a deep dark hole from which many weird taxonomic ranks emerge from time to time. one of the taxa had the rank "epifamily" https://www.marinespecies.org/aphia.php?p=taxdetails&id=1459303

fixed, if you reinstall it should work

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/taxize/issues/847#issuecomment-700336045, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRXZV4PO2ACAGOQRKIT5XLSIEL3PANCNFSM4R47GSXQ .

sckott commented 4 years ago

thanks

oharac commented 4 years ago

I encountered this same error with the WoRMS database again, but seems to be for a different reason. taxize::downstream for family Polynoidae (id 939) returns:

downstream(939, db = 'worms', downto = 'species')[[1]]
# Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1) : 
#   values must be length 1,
#  but FUN(X[[376]]) result is length 0

Knowing that prior issues were due to oddball ranks, so I checked the downstream ranks. Here the problem is an NA rank, caused by a null rank listed in the output from the AphiaChildrenByAphiaID API endpoint. The ones I've found so far are children of ID 129496 though I have not done an exhaustive search so there may be others as well. Here is part of the record for one example, ID 333822, as retrieved from https://www.marinespecies.org/rest/AphiaChildrenByAphiaID/129496?marine_only=true&offset=95:

    "AphiaID": 333822,
    "url": "https://www.marinespecies.org/aphia.php?p=taxdetails&id=333822",
    "scientificname": "Lepidonotus pellucidus",
    "authority": "Dyster in Johnston, 1865",
    "status": "accepted",
    "unacceptreason": null,
    "taxonRankID": 220,
    "rank": null,
    "valid_AphiaID": 333822,
    "valid_name": "Lepidonotus pellucidus",

However, when accessing this species in the other direction, using the AphiaClassificationByAphiaID endpoint (https://www.marinespecies.org/rest/AphiaClassificationByAphiaID/333822), the API seems to return the rank as "Species" as expected. This seems to be an issue on the WoRMS end (and I emailed them to point it out), but in the mean time perhaps there's a graceful way to handle the NA rank value in taxize::downstream() without throwing an error. Thanks!

sckott commented 4 years ago

Thanks for the report. Unfortunately, there's no way to handle missing ranks really, other than perhaps making additional http requests for every single name that does not have a rank, which seems like a mess and I'd rather avoid doing that. For now, I'm changing (reinstall to get change) the code to change missing ranks for WORMS to "no rank" (which NCBI has a lot of), and then the existing code handles the "no rank" already. "no rank" taxa are dropped in most cases. The errors are coming from the prune_too_low function https://github.com/ropensci/taxize/blob/master/R/downstream-utils.R#L9 where we drop any taxa that have ranks lower than the target rank.

oharac commented 4 years ago

I emailed the WoRMS folks and their response was that they couldn't replicate the null rank thing - so checking today, I can't replicate it either - I guess it was an intermittent problem (though I could replicate it on the day I posted the issue).

sckott commented 4 years ago

Thanks for the follow up. Well glad it was an intermittent thing; hopefully it doesn't come back.

oharac commented 4 years ago

A new instance of the zero-length error in WoRMS downstream:

downstream(345465, db = 'worms', downto = 'class', marine_only = FALSE)[[1]]

In case this is a similar problem to those noted before, where odd taxonomic ranks would create this error, I checked the children of this sequentially to identify any unusual ranks.

sckott commented 4 years ago

thanks! will have a look

sckott commented 4 years ago

@oharac should be fixed now. the missing rank was infraphylum

oharac commented 3 years ago

EDITED...

getting back into this project, ran across this error again

Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1) : 
  values must be length 1,
 but FUN(X[[53]]) result is length 0

Reprex:

library(taxize)
downstream(sci_id = 1821, db = 'worms', downto = 'class')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[3]]) result is length 0

Created on 2021-08-20 by the reprex package (v1.0.0)

Sequential calls to children showed where the code seemed to be choking. I wonder if these ranks need to be added to the rank_ref_zoo?

parvphylum, megaclass, gigaclass

More reprex:

library(taxize)
### chokes on 1821:
downstream(1821, db = 'worms', downto = 'class')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[3]]) result is length 0
children(sci_id = 1821, db = 'worms')
#> $`1821`
#> # A tibble: 4 x 3
#>   childtaxa_id childtaxa_name  childtaxa_rank
#>          <int> <chr>           <chr>         
#> 1         1824 Cephalochordata Subphylum     
#> 2       146420 Tunicata        Subphylum     
#> 3         1822 Urochordata     Subphylum     
#> 4       146419 Vertebrata      Subphylum     
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "worms"

### chokes on subphylum Vertebrata:
downstream(146419, downto = 'class', db = 'worms')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[3]]) result is length 0
children(146419, db = 'worms')
#> $`146419`
#> # A tibble: 2 x 3
#>   childtaxa_id childtaxa_name childtaxa_rank
#>          <int> <chr>          <chr>         
#> 1         1829 Agnatha        Infraphylum   
#> 2         1828 Gnathostomata  Infraphylum   
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "worms"

### chokes on infraphylum Gnathostomata:
downstream(1828, downto = 'class', db = 'worms')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[1]]) result is length 0
children(1828, db = 'worms')
#> $`1828`
#> # A tibble: 4 x 3
#>   childtaxa_id childtaxa_name childtaxa_rank
#>          <int> <chr>          <chr>         
#> 1      1517375 Chondrichthyes Parvphylum    
#> 2       152352 Osteichthyes   Parvphylum    
#> 3        11676 Pisces         Superclass    
#> 4         1831 Tetrapoda      Megaclass     
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "worms"

### chokes on parvphylum Osteichthyes
downstream(152352, downto = 'class', db = 'worms')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[1]]) result is length 0
children(152352, db = 'worms')
#> $`152352`
#> # A tibble: 2 x 3
#>   childtaxa_id childtaxa_name childtaxa_rank
#>          <int> <chr>          <chr>         
#> 1        10194 Actinopterygii Gigaclass     
#> 2       163509 Sarcopterygii  Gigaclass     
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "worms"

### finally is OK at this stage
downstream(10194, downto = 'class', db = 'worms')
#> $`10194`
#>       id        name  rank
#> 1 843664 Actinopteri class
#> 
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "worms"
downstream(163509, downto = 'class', db = 'worms')
#> $`163509`
#>       id        name  rank
#> 1 843665 Coelacanthi class
#> 
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "worms"
# both OK

Created on 2021-08-21 by the reprex package (v1.0.0)

zachary-foster commented 3 years ago

Thanks for the info! I will look into this and see about adding those ranks.

zachary-foster commented 3 years ago

Sorry for the delay. I have added the ranks and made the error message better.

You can try out the change by installing this version that will be pushed to CRAN soon hopefully, but note this version has many other changes and might break other code.

install.packages("remotes")
remotes::install_github("ropensci/taxize")