Closed jphill01 closed 1 year ago
Hi,
the reason is that bold has records private and public records. Their API returns all records with stats function, but you can check specifically for public COI-5P records with bold_tax_id("12439", dataTypes = "stats")
(you can get the tax id with bold_tax_name("Homo sapiens")
) and it will show that there are 48411 public records.
> bold::bold_tax_name("Homo sapiens")
taxid taxon tax_rank tax_division parentid parentname taxonrep specimenrecords representitive_image.image
1 12439 Homo sapiens species Animalia 4523 Homo Homo sapiens 48704 BIOP/hebert+1354300515.jpg
representitive_image.apectratio input
1 1.502 Homo sapiens
> bold::bold_tax_id("12439", dataTypes = "stats")
input publicmarkersequences.COI.5P publicmarkersequences.atp6 publicmarkersequences.COI.3P publicmarkersequences.COII
1 12439 48411 2095 4 2095
publicmarkersequences.COI.PSEUDO publicmarkersequences.COXIII publicmarkersequences.CYTB publicmarkersequences.D.loop publicmarkersequences.ND1
1 1 2096 2096 2095 1
publicmarkersequences.ND2 publicmarkersequences.ND3 publicmarkersequences.ND4 publicmarkersequences.ND4L publicmarkersequences.ND5.0
1 1 1 1 1 1
publicmarkersequences.ND6 publicrecords publicspecies publicsubspecies publicbins specimenrecords sequencedspecimens barcodespecimens species
1 1 48417 1 1 1 48704 59069 47876 1
barcodespecies
1 1
I could make a note of that in the docs, that the bold_stats
function includes private records.
I think that would help dispel any confusion.
Revisiting this issue, I just realised my explanation wasn't true. As you can see in the code bloc of my previous reply (copied below), there are indeed 48417 public records, it's just that not all records are COI-5P.
> bold::bold_tax_name("Homo sapiens")
taxid taxon tax_rank tax_division parentid parentname taxonrep specimenrecords representitive_image.image
1 12439 Homo sapiens species Animalia 4523 Homo Homo sapiens 48704 BIOP/hebert+1354300515.jpg
representitive_image.apectratio input
1 1.502 Homo sapiens
> bold::bold_tax_id("12439", dataTypes = "stats")
input publicmarkersequences.COI.5P publicmarkersequences.atp6 publicmarkersequences.COI.3P publicmarkersequences.COII
1 12439 48411 2095 4 2095
publicmarkersequences.COI.PSEUDO publicmarkersequences.COXIII publicmarkersequences.CYTB publicmarkersequences.D.loop publicmarkersequences.ND1
1 1 2096 2096 2095 1
publicmarkersequences.ND2 publicmarkersequences.ND3 publicmarkersequences.ND4 publicmarkersequences.ND4L publicmarkersequences.ND5.0
1 1 1 1 1 1
publicmarkersequences.ND6 publicrecords publicspecies publicsubspecies publicbins specimenrecords sequencedspecimens barcodespecimens species
1 1 48417 1 1 1 48704 59069 47876 1
barcodespecies
1 1
Same thing with the taxa "Homo" :
> bold_tax_id2("4523", dataTypes = "stats")
input publicrecords publicspecies publicsubspecies publicbins specimenrecords sequencedspecimens
1 4523 48455 4 1 1 48743 59141
barcodespecimens species barcodespecies publicmarkersequences.COI.5P publicmarkersequences.atp6
1 47912 4 4 48449 2096
publicmarkersequences.COI.3P publicmarkersequences.COII publicmarkersequences.COI.PSEUDO
1 4 2099 1
publicmarkersequences.COXIII publicmarkersequences.CYTB publicmarkersequences.D.loop
1 2100 2100 2095
publicmarkersequences.ND1 publicmarkersequences.ND2 publicmarkersequences.ND3
1 4 4 4
publicmarkersequences.ND4 publicmarkersequences.ND4L publicmarkersequences.ND5.0
1 4 4 4
publicmarkersequences.ND6
1 4
48455 public records, 48449 COI-5P records.
I want to download all Homo sapiens COI-5P data from BOLD.
bold_stats("Homo sapiens")
indicates there are 48417 such records in BOLD. However, only 48411 are returned bybold_seq("Homo sapiens", "COI-5P")
.Is this an issue with the large data request note indicated in the function documentation?
I tried also with the genus Homo, which comprises 48449 records, but only 48443 are retrieved using
bold_seq()
.Session Info
R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 11.4 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] bold_1.2.0 loaded via a namespace (and not attached): [1] compiler_4.2.1 magrittr_2.0.3 plyr_1.8.7 R6_2.5.1 [5] tools_4.2.1 httpcode_0.3.0 curl_4.3.2 urltools_1.7.3 [9] Rcpp_1.0.9 triebeard_0.3.0 xml2_1.3.3 stringi_1.7.8 [13] reshape_0.8.9 crul_1.3 stringr_1.4.0 jsonlite_1.8.0