ropensci / bold

Interface to the Bold Systems barcode webservice
https://docs.ropensci.org/bold
Other
17 stars 11 forks source link

bold_tax_id returns timeout error after hundreds to thousands of calls #68

Closed cbird808 closed 4 years ago

cbird808 commented 4 years ago
Session Info ```r > devtools::session_info() - Session info --------------------------------------------------------------------------------------- setting value version R version 3.6.0 (2019-04-26) os Windows >= 8 x64 system x86_64, mingw32 ui RStudio language (EN) collate English_United States.1252 ctype English_United States.1252 tz America/Chicago date 2019-08-14 - Packages ------------------------------------------------------------------------------------------- package * version date lib source assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0) bold * 0.9.0 2019-06-27 [1] CRAN (R 3.6.1) callr 3.2.0 2019-03-15 [1] CRAN (R 3.6.0) CHNOSZ * 1.3.3 2019-08-02 [1] CRAN (R 3.6.1) cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) crul 0.8.4 2019-08-02 [1] CRAN (R 3.6.1) curl 3.3 2019-01-10 [1] CRAN (R 3.6.0) data.table 1.12.2 2019-04-07 [1] CRAN (R 3.6.1) desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) devtools 2.0.2 2019-04-08 [1] CRAN (R 3.6.0) digest 0.6.19 2019-05-20 [1] CRAN (R 3.6.0) fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0) glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0) httpcode 0.2.0 2016-11-14 [1] CRAN (R 3.6.0) jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0) pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) plyr 1.8.4 2016-06-08 [1] CRAN (R 3.6.0) pracma * 2.2.5 2019-04-09 [1] CRAN (R 3.6.1) prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0) processx 3.3.1 2019-05-08 [1] CRAN (R 3.6.0) ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0) Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.6.0) remotes 2.0.4 2019-04-10 [1] CRAN (R 3.6.0) reshape 0.8.8 2018-10-23 [1] CRAN (R 3.6.0) rlang 0.3.4 2019-04-07 [1] CRAN (R 3.6.0) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0) stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) testthat 2.1.1 2019-04-23 [1] CRAN (R 3.6.0) triebeard 0.3.0 2016-08-04 [1] CRAN (R 3.6.1) urltools 1.7.3 2019-04-14 [1] CRAN (R 3.6.1) usethis 1.5.0 2019-04-07 [1] CRAN (R 3.6.0) withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) xml2 1.2.0 2018-01-24 [1] CRAN (R 3.6.0) yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0) [1] C:/Users/cbird/Documents/R/win-library/3.6 [2] C:/Program Files/R/R-3.6.0/library ```

after using bold_tax_id to look up hundreds to thousands of IDs, it gets timed out. I'm guessing that the bold server is locking me out, yes?


> res[1,]$taxid

[1] 99814

> boldTaxid = res[1,]$taxid

> bold_tax_id (boldTaxid, includeTree = TRUE)

Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : 

  Timeout was reached: Connection timed out after 10000 milliseconds
sckott commented 4 years ago

thanks, see related issue #52 wrt timing out

its most likely BOLD's servers timing out, but I'll have a look

sckott commented 4 years ago

bold_tax_id calls lapply on each id anyway, so you can use your own iteration through the ids to have the same effect, but add error catching with tryCatch or similar, and either handle errors by doing the request again, or maybe adding a Sys.sleep(...) when a timeout error happens to back off the requests for a bit to hopefully get back to a state where the BOLD servers won't error. For example:

library(bold)
x <- bold_tax_name(name='Osm', fuzzy=TRUE)
tax_ids <- x$taxid[1:10]
res <- list()
for (i in seq_along(tax_ids)) {
  tmp <- tryCatch(bold_tax_id(tax_ids[i]), error = function(e) e)
  res[[ as.character(tax_ids[i]) ]] <- tmp
  # or do some kind of if statement, handling errors differently
  # e.g., trying the request again or so
}
# if there's any errors, can handle those here if not 
#   handled within for loop
# bind together with e.g., bind_rows
dplyr::bind_rows(res)
cbird808 commented 4 years ago

Thanks for responding!

We are using blast_tax_name() in our metabarcoding pipeline to fill in the missing tax names for each OTU (Phylum C O F G S) resulting from queries of NCBI taxonomic database.

So, it sounds like I wouldn't have any better luck by passing a vector to blast_tax_name.

I'll add the tryCatch to prevent the pipeline from breaking, thanks for the suggestion. In terms Sys.sleep, we get locked out for a fair amount of time (more than 15 minutes).

The ultimate solution is probably downloading the database.

sckott commented 4 years ago

That is a long time to be locked out. You could try Sys.sleep between each request - which should make it less likely that you'll get server errors. e.g., in that for loop put a Sys.sleep before each bold_tax_id request, or after, either way.

For downloading the database, are you referring to http://v4.boldsystems.org/index.php/datarelease ?