ropensci / bold

Interface to the Bold Systems barcode webservice
https://docs.ropensci.org/bold
Other
17 stars 11 forks source link

BOLD servers returned an error #72

Closed Anto007 closed 4 years ago

Anto007 commented 4 years ago

I get the below error when I try to download the full COI database (tried multiple times). The R script I used is attached herewith. Any help here would be appreciated.

"Error in bold_seqspec(taxon = "Acoelomorpha", marker = "COI-5P", format = "tsv", : BOLD servers returned an error - we're not sure what happened try a smaller query - or open an issue and we'll try to help Execution halted"

BOLD_database_download.txt

sckott commented 4 years ago

Thanks for the issue @Anto007

BOLD service sometimes fails without giving a good reason. When the query results in a large amount of results, that often leads to errors. If you can, I would split up your query into a set of smaller queries. That is, split up your taxonomic query in to many queries that are the children of that taxon.

There's no taxonomic children listed on the BOLD taxon page for Acoelomorpha http://boldsystems.org/index.php/Taxbrowser_Taxonpage?taxid=296140# You should explore BOLD's taxa and see if there's another name or set of names you could use. Once you get those names, you can do this e.g. to get taxa downstream to then use for querying BOLD for data,

library(taxize)
id <- get_boldid('Gadus')
#> bold_downstream(id, downto="species")
#>                  name     id    rank
#> 1 Gadus chalcogrammus 360473 species
#> 2 Gadus macrocephalus  19837 species
#> 3        Gadus morhua  26136 species
#> 4          Gadus ogac 747382 species
#> 5           Gadus sp. 674263 species
#> 6  Gadus sp. OPC-2017 794750 species
Anto007 commented 4 years ago

Many thanks Scott for your prompt response. Finding the children of each taxon and then querying BOLD separately for each sounds to me like a cumbersome task. It seems like I might need to sit and do this if there's really no other easy way around to download the COI datasets in its entirety (sigh!). I've got in touch with the BOLD support desk and I'm now really hoping that there's a convenient solution out there. Thanks again for your support though

sckott commented 4 years ago

sounds to me like a cumbersome task.

It's out of my control. The BOLD folks are at fault here. There's nothing I can do in this package to make it easier unfortunately.

Anto007 commented 4 years ago

Many thanks; I completely understand! One last thing: Do you think children of taxa are necessary and required to be separately queried? I can understand doing this in case of Arthropoda since it contains a very high number of records. If I set up individual R scripts for the higher taxa levels as in the list below, do you think (from your own experience) that it will result in retrieval of incomplete or erroneous records? Thanks again! Hemichordata Mollusca Nematoda Nemertea Onychophora Platyhelminthes Porifera Priapulida Rotifera Sipuncula Tardigrada

Anto007 commented 4 years ago

Also I wonder if you know whether querying those higher taxa levels on the BOLD web-interface will in any way result in the retrieval of incomplete/erroneous records? My feeling is that the web-interface may be a relatively safer option here but I might be wrong.

Anto007 commented 4 years ago

Just in case someone else runs into a similar issue, the below wget solution worked for me. Closing this ticket. Thanks.

for taxon in $(cat listoftaxa.txt)

do

echo "Searching for $taxon..."

wget "http://www.boldsystems.org/index.php/API_Public/sequence?marker=COI-3P|COI-5P&taxon=${taxon}" -O ${taxon}.fasta

wget "http://www.boldsystems.org/index.php/API_Public/specimen?marker=COI-3P|COI-5P&taxon=${taxon}&format=tsv" -O ${taxon}.txt

done

sckott commented 4 years ago

sorry for delay in answering.

Do you think children of taxa are necessary and required to be separately queried?

Probably depends on how many taxa, and how much data, are in the taxonomic group.

You can use the web interface of course, but hopefully we can use solutions that involve code so your work is more reproducible.

Glad that wget solution worked for you!