ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
268 stars 61 forks source link

TNRS reaching timeout #438

Closed bw4sz closed 9 years ago

bw4sz commented 9 years ago

Hey Scott,

I've been using taxize in one of my scripts for about a year now. The script gets periodically rerun as new data is fed into the database, but no code has changed. Today i got a flag that the tnrs query is reaching a timeout. I reinstalled taxize from CRAN but still got the issue.

tax<-tnrs(query = Species[-1], source = "iPlant_TNRS",splitby=30)

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached

This may be a temporary error, but just alerting you that the example on tnrs help screen gives the same error:

mynames <- c("Panthera tigris", "Eutamias minimus", "Magnifera indica",
+              "Humbert humbert", "Helianthus annuus", "Pinus contorta", "Poa annua",
+              "Abies magnifica", "Rosa california", "Festuca arundinace",
+              "Mimulus bicolor", "Sorbus occidentalis","Madia sativa", "Thymopsis thymodes",
+              "Bartlettia scaposa")
 tnrs(mynames, source = "NCBI")
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached

I'll try to play around with this more later.

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
[1] taxize_0.6.0    chron_2.3-45    reshape_0.8.5  
[4] plotKML_0.4-8   dplyr_0.3.0.2   maptools_0.8-30
[7] sp_1.0-16       reshape2_1.4.1  ggplot2_1.0.0  

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3     ape_3.2             aqp_1.7-7          
 [4] assertthat_0.1      bitops_1.0-6        bold_0.2.0         
 [7] class_7.3-11        classInt_0.1-22     cluster_1.15.3     
[10] codetools_0.2-9     colorRamps_2.3      colorspace_1.2-4   
[13] curl_0.9            data.table_1.9.4    DBI_0.3.1          
[16] digest_0.6.6        dismo_1.0-5         e1071_1.6-4        
[19] FNN_1.1             foreach_1.4.2       foreign_0.8-61     
[22] Formula_1.2-0       grid_3.1.2          gstat_1.0-21       
[25] gtable_0.1.2        Hmisc_3.14-6        httr_1.0.0         
[28] intervals_0.15.0    iterators_1.0.7     jsonlite_0.9.14    
[31] lattice_0.20-29     latticeExtra_0.6-26 magrittr_1.5       
[34] MASS_7.3-35         mime_0.2            munsell_0.4.2      
[37] nlme_3.1-118        nnet_7.3-8          parallel_3.1.2     
[40] pixmap_0.4-11       plotrix_3.5-10      plyr_1.8.1         
[43] proto_0.3-10        R6_2.0.1            raster_2.3-12      
[46] RColorBrewer_1.1-2  Rcpp_0.11.3         RCurl_1.95-4.5     
[49] rgdal_0.9-1         rpart_4.1-8         RSAGA_0.93-6       
[52] scales_0.2.4        spacetime_1.1-2     stringr_0.6.2      
[55] survival_2.37-7     Taxonstand_1.7      tools_3.1.2        
[58] XML_3.98-1.1        xts_0.9-7           zoo_1.7-11  
sckott commented 9 years ago

hi @bw4sz - thanks for the report!

I noticed that too. The taxosaurus http://taxosaurus.org/ service behind that fxn, and tnrs_sources() is unreliable (their server is often completely down, and if not down, very slow), so I've decided to make those functions defunct, that is, no longer avail. in the package. See this commit https://github.com/ropensci/taxize/commit/1cb8e6473af67adc7f4f71dfbadebf2f9519f99d?w=1

So you can of course still use taxosaurus yourself, but it seems unwise to support it in taxize if it's not reliable. If they fix issues in the future, would be easy to put the functions back in.

Sound reasonable?

bw4sz commented 9 years ago

Thanks @sckott,

Yes - sounds like time to move to a new source. Which function would you suggest i use to replace that aim, i really liked the fuzzy name matching (people never spell check). The species are fairly obscure tropical plants that i collect local sightings from people in the field.

I'll switch over to gnr_resolve()?

At first glance, looks like i'm still having some issues.

> tax<-gnr_resolve(names = Families,best_match_only = F)
> head(tax$results)
  submitted_name matched_name data_source_title score
1            ???                                  NaN
2    Acanthaceae  Acanthaceae Catalogue of Life  0.75
3    Acanthaceae  Acanthaceae              ITIS  0.75
4    Acanthaceae  Acanthaceae              NCBI  0.75
5    Acanthaceae  Acanthaceae             WoRMS  0.75
6    Acanthaceae  Acanthaceae          Freebase  0.75
> tax<-gnr_resolve(names = Families,best_match_only = T)
Error in data.frame(x[[1]], ldply(if (length(x[[2]]) == 0) { : 
  arguments imply differing number of rows: 1, 0

Perhaps i should prescreen for people entering special characters?

Families
 [1] ""                 "???"              "Acanthaceae"     
 [4] "Alstroemeriaceae" "Asteraceae"       "Begoniaceae"     
 [7] "brm"              "Bromeliaceae"     "Campanulaceae"   
[10] "Cannaceae"        "Capparidaceae"    "Cleomaceae"      
[13] "Clusiaceae"       "Costaceae"        "Cucurbitaceae"   
[16] "Ericaceae"        "Fabaceae"         "Gentianaceae"    
[19] "Gesneriaceae"     "Gunneraceae"      "Heliconiaceae"   
[22] "Hydranganceae"    "Lamiaceae"        "Malvaceae"       
[25] "Maranthaceae"     "Marcgraviaceae"   "Melastomataceae" 
[28] "Myrtaceae"        "Onagraceae"       "Orchidaceae"     
[31] "Rubiaceae"        "Solanaceae"       "Tropaeolaceae"   
[34] "Zingiberaceae"    "alstroemeriaceae" "begoniacea"      
[37] "Begoniacea"       "bromeliaceae"     "campanulaceae"   
[40] "caparaceae"       "clusiaceae"       "costaceae"       
[43] "ericaceae"        "fabaceae"         "gesneriaceae"    
[46] "heliconiaceae"    "lamiacea"         "Lamiacea"        
[49] "lamiaceae"        "malvaceae"        "ms4"             
[52] "onagraceae"       "rubiaceae"        "solanaceae"      
[55] "zingiberaceae"   

Just a followup, i found this bug as well

> gnr_datasources()
Error in curl::handle_setopt(handle, .list = req$options) : 
  Option httpheader (10023) not supported.

Thanks for all your work getting into the gritty details.

sckott commented 9 years ago

Hmm, I'm not getting that error with gnr_datasources() Can you install latest taxize from github? devtools::install_github("ropensci/taxize") to see if that fixes it. There hasn't been any recent changes in that fxn https://github.com/ropensci/taxize/commits/master/R/gnr_datasources.R so must be deps - can't replicate with the versions of curl or httr you have.

bw4sz commented 9 years ago

Nope, sorry. Clean install and reboot, fresh R session.

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> devtools::install_github("ropensci/taxize")
Downloading github repo ropensci/taxize@master
Installing taxize
"C:/PROGRA~1/R/R-31~1.2/bin/x64/R" --vanilla CMD INSTALL  \
  "C:\Users\Ben\AppData\Local\Temp\RtmpErfOI9\devtools1a104cb95e38\ropensci-taxize-1cb8e64"  \
  --library="C:/Users/Ben/Documents/R/win-library/3.1" --install-tests 

* installing *source* package 'taxize' ...
** R
** data
*** moving datasets to lazyload DB
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (taxize)
> library(taxize)
> gnr_datasources()
Error in curl::handle_setopt(handle, .list = req$options) : 
  Option httpheader (10023) not supported.
bw4sz commented 9 years ago

Sorry, should have said that i reinstalled curl and httr as well. This is not critical, there were some name changes ("acceptedname" to 'matched_name') in the gnr_resolve that i need to go through and fix.

sckott commented 9 years ago

Weird on gnr_datasources() - will see if I can find that problem.

Turns out if did make changes in gnr_resolve() to try to fix something, and broke other things, fixing that now...

sckott commented 9 years ago

@bw4sz okay, reinstall from github, hopefully gnr_resolve() all working now

sckott commented 9 years ago

taxa not found are spit out into a 3rd list element

bw4sz commented 9 years ago

Sorry, nope, maybe its me. Happy to do some leg work if you give me an inkling of where to go. Reinstall R?


R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> install.packages(c("httr","curl"))
Installing packages into ‘C:/Users/Ben/Documents/R/win-library/3.1’
(as ‘lib’ is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.1/httr_1.0.0.zip'
Content type 'application/zip' length 376160 bytes (367 Kb)
opened URL
downloaded 367 Kb

trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.1/curl_0.9.zip'
Content type 'application/zip' length 2240141 bytes (2.1 Mb)
opened URL
downloaded 2.1 Mb

package ‘httr’ successfully unpacked and MD5 sums checked
package ‘curl’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\Ben\AppData\Local\Temp\Rtmpaqjs2q\downloaded_packages
> devtools::install_github("ropensci/taxize")
Downloading github repo ropensci/taxize@master
Installing taxize
"C:/PROGRA~1/R/R-31~1.2/bin/x64/R" --vanilla CMD INSTALL  \
  "C:\Users\Ben\AppData\Local\Temp\Rtmpaqjs2q\devtools1e5432a410ab\ropensci-taxize-fee1aa9"  \
  --library="C:/Users/Ben/Documents/R/win-library/3.1" --install-tests 

* installing *source* package 'taxize' ...
** R
** data
*** moving datasets to lazyload DB
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (taxize)
> library(taxize)
> gnr_datasources()
Error in curl::handle_setopt(handle, .list = req$options) : 
  Option httpheader (10023) not supported.
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] taxize_0.6.0.9961

loaded via a namespace (and not attached):
 [1] ape_3.2          assertthat_0.1   bitops_1.0-6     bold_0.2.0       chron_2.3-45    
 [6] codetools_0.2-9  curl_0.9         data.table_1.9.4 devtools_1.6.1   evaluate_0.5.5  
[11] foreach_1.4.2    formatR_1.0      grid_3.1.2       httr_1.0.0       iterators_1.0.7 
[16] jsonlite_0.9.14  knitr_1.9        lattice_0.20-29  nlme_3.1-118     plyr_1.8.1      
[21] R6_2.0.1         Rcpp_0.11.3      RCurl_1.95-4.5   reshape_0.8.5    reshape2_1.4.1  
[26] stringr_0.6.2    Taxonstand_1.7   tools_3.1.2      XML_3.98-1.1   

I have access to a couple servers, i can see if this is just my laptop.

sckott commented 9 years ago

i'll take a peak at this on my windows virtual box

sckott commented 9 years ago

ah, @bw4sz look like your on R v3.1.2 - Try updating to v3.2.1

tested on my windows machine and everything worked fine

sckott commented 9 years ago

@bw4sz did you try upgrading your R version?

bw4sz commented 9 years ago

Just now. Looking good. Thanks!

sckott commented 9 years ago

:+1: