ropensci / bold

Interface to the Bold Systems barcode webservice
https://docs.ropensci.org/bold
Other
17 stars 11 forks source link

The bold_seqspec function is returning a weird data frame with only 1 variable and 7 occurrences #66

Closed tadeu95 closed 4 years ago

tadeu95 commented 5 years ago

I'm not being able to run the function because it returns a data frame with only 7 observations and 1 variable. The thing is I've ran the function times and times again and it worked fine. The function I used was this:

list_species<-function(groups){
  groups=c("Actinopterygii","Sarcopterygii","Elasmobranchii","Holocephali","Cyclostomata")
  fish<-bold_seqspec(taxon=groups, format = "tsv", marker="COI-5P")
  fish2<-fish[(!(is.na(fish$lat)) | fish$country!="") & (fish$species_name!=""),]
  fish2$number<-str_count(fish2$nucleotides, pattern="[A-Z]")
  fish3<-fish2[(fish2$number>499),]
  assign('fish3',fish3,envir=.GlobalEnv)
}
list_species(groups)

My session info:

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252  LC_CTYPE=Portuguese_Portugal.1252    LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C                        
[5] LC_TIME=Portuguese_Portugal.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggplot2_3.2.0     dplyr_0.8.1       fingerprint_3.5.7 readr_1.3.1       stringr_1.4.0     worms_0.2.2       plyr_1.8.4        httr_1.4.0       
 [9] rentrez_1.2.2     data.table_1.12.2 bold_0.8.6        seqRFLP_1.0.1    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       pillar_1.4.1     compiler_3.5.1   tools_3.5.1      jsonlite_1.6     tibble_2.1.3     gtable_0.3.0     pkgconfig_2.0.2 
 [9] rlang_0.3.4      rstudioapi_0.10  crul_0.7.4       curl_3.3         yaml_2.2.0       withr_2.1.2      xml2_1.2.0       hms_0.4.2       
[17] triebeard_0.3.0  grid_3.5.1       tidyselect_0.2.5 reshape_0.8.8    glue_1.3.1       httpcode_0.2.0   R6_2.4.0         XML_3.98-1.20   
[25] purrr_0.3.2      magrittr_1.5     urltools_1.7.3   scales_1.0.0     assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3    lazyeval_0.2.2  
[33] munsell_0.5.0    crayon_1.3.4  

Thank you so much in advance for any response

sckott commented 5 years ago

thanks for your question @tadeu95 !

First, what you're doing is not what you think you're doing.

There is no parameter fish in the function bold_seqspec(). So if we do

bold_seqspec(fish=groups, format = "tsv", marker="COI-5P", verbose = TRUE)

We can see that the actual request is

http://v4.boldsystems.org/index.php/API_Public/combined?marker=COI-5P&combined_download=tsv

Without any of your taxon names.

Second, if you do use the taxon = group parameter the query is too large and the BOLD server returns an error but they don't do it correctly so the function can't return that information to the user. I know it's not ideal, but if you can use bold_specimens and/or bold_seq those seem to work fine on most of your taxa, maybe for the bigger one Actinopterygii you can break that up into smaller taxonomic groups that will work - to get smaller groups you can try e.g., downstream("Actinopterygii", db = "ncbi", downto = "class") where downto is the rank you want to get, so put some rank in there you want to get

tadeu95 commented 5 years ago

So I corrected the script, it was a mistake, it's really "taxon=groups", I automatically replaced the word fish and forgot it was there. I didn't have the parameter "verbose" before and it worked well. What does verbose do? The thing is I've ran this function many times with those taxonomic groups and it worked perfectly, until yesterday when it started giving me these problems. And I've also tried it with smaller groups and it's still not working. I discovered that the BOLD API in the BOLD website isn't working well, do you think that could be the reason? Thank you for the answer

sckott commented 5 years ago

verbose is one of many curl options you can pass in to the http request call. see ?curl::curl_options

I've ran this function many times with those taxonomic groups and it worked perfectly, until yesterday when it started giving me these problems.

that probably means BOLD website is having problems - and at some point it will be fine again - of course this is out of our control unforunately

tadeu95 commented 5 years ago

Ok, thank you for the answers, I'm sure it will be up again soon