Closed LunaSare closed 5 years ago
thanks @LunaSare Will have a look in the morning
I've asked about this
@LunaSare response from BOLD
I have looked into this on our API and it appears to be acting as it should. To explain, the search parameters for this API are record based. These are specimen records with multiple markers sequenced, and thus the API returns all sequences for the records where at least one of the markers matches the search criteria. The Process ID is displayed in the first field in the FASTA header which indicates which sequences are related to each other as they are associated with the same specimen.
Does that make sense?
Session Info
```r Session info ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- setting value version R version 3.4.1 (2017-06-30) system x86_64, darwin15.6.0 ui AQUA language (EN) collate en_US.UTF-8 tz America/New_York date 2017-10-16 Packages ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- package * version date source ade4 1.7-6 2017-03-23 CRAN (R 3.4.0) ape * 4.1 2017-02-14 CRAN (R 3.4.0) base * 3.4.1 2017-07-07 local BiocGenerics * 0.22.1 2017-10-07 Bioconductor BiocInstaller * 1.26.1 2017-09-01 Bioconductor bold * 0.5.0 2017-07-21 CRAN (R 3.4.1) colorspace 1.3-2 2016-12-14 CRAN (R 3.4.0) compiler 3.4.1 2017-07-07 local crayon 1.3.2 2016-06-28 CRAN (R 3.4.0) crul 0.3.8 2017-06-15 CRAN (R 3.4.0) curl 2.8.1 2017-07-21 CRAN (R 3.4.1) datasets * 3.4.1 2017-07-07 local datelife * 0.2.13 2017-10-13 Github (phylotastic/datelife@ae43b8f) devtools 1.13.3 2017-08-02 CRAN (R 3.4.1) digest 0.6.12 2017-01-27 CRAN (R 3.4.0) fastmatch 1.1-0 2017-01-28 CRAN (R 3.4.0) git2r 0.19.0 2017-07-19 CRAN (R 3.4.1) graphics * 3.4.1 2017-07-07 local grDevices * 3.4.1 2017-07-07 local grid 3.4.1 2017-07-07 local httr 1.3.1 2017-08-20 cran (@1.3.1) igraph 1.1.2 2017-07-21 CRAN (R 3.4.1) ips 0.0-7 2014-11-10 CRAN (R 3.4.0) IRanges * 2.10.5 2017-10-08 Bioconductor jsonlite 1.5 2017-06-01 CRAN (R 3.4.0) lattice 0.20-35 2017-03-25 CRAN (R 3.4.1) magrittr 1.5 2014-11-22 CRAN (R 3.4.0) Matrix 1.2-10 2017-05-03 CRAN (R 3.4.1) memoise 1.1.0 2017-04-21 CRAN (R 3.4.0) methods * 3.4.1 2017-07-07 local nlme 3.1-131 2017-02-06 CRAN (R 3.4.1) parallel * 3.4.1 2017-07-07 local phangorn 2.2.0 2017-04-03 CRAN (R 3.4.0) pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0) plyr 1.8.4 2016-06-08 CRAN (R 3.4.0) quadprog 1.5-5 2013-04-17 CRAN (R 3.4.0) R6 2.2.2 2017-06-17 CRAN (R 3.4.0) Rcpp 0.12.12 2017-07-15 CRAN (R 3.4.1) reshape 0.8.6 2016-10-21 CRAN (R 3.4.0) S4Vectors * 0.14.7 2017-10-08 Bioconductor seqinr * 3.4-5 2017-08-01 CRAN (R 3.4.1) stats * 3.4.1 2017-07-07 local stats4 * 3.4.1 2017-07-07 local stringi 1.1.5 2017-04-07 CRAN (R 3.4.0) stringr 1.2.0 2017-02-18 CRAN (R 3.4.0) testthat * 1.0.2 2016-04-23 CRAN (R 3.4.0) tools 3.4.1 2017-07-07 local triebeard 0.3.0 2016-08-04 CRAN (R 3.4.0) urltools 1.6.0 2016-10-17 CRAN (R 3.4.0) utils * 3.4.1 2017-07-07 local withr 2.0.0 2017-07-28 CRAN (R 3.4.1) XML * 3.98-1.9 2017-06-19 CRAN (R 3.4.1) xml2 1.1.1 2017-01-24 CRAN (R 3.4.0) ```Hi! I've been using
bold::bold_seqspec()
to search for plant and fungi markers. There appears to be an error with themarker
argument, since it will output different types of markers for a single marker query:library(bold)
res <- bold_seqspec(taxon="Arabidopsis", marker="rbcL")
res$markercode
[1] "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" "rbcL" [27] "rbcL" "rbcL" "matK" "rbcL" "matK" "rbcL" "rbcL" "rbcL" "matK" "rbcL" "rbcL" "matK" "rbcL" "matK" "matK" "rbcL"
And searching for these markers with blast shows that they correspond to the gene specified in
$markercode
:which(res$markercode=="rbcL")
is rbcL in blastwhich(res$markercode=="matK")
is matK in blastres2 <- bold_seqspec(taxon="Arabidopsis", marker=c("ITS2"))
res2$markercode
we get a wide mixture of different markers[1] "ITS2" "rbcLa" "rbcLa" "ITS2" "ITS2" "rbcLa" "rbcLa" "COI-5P" "ITS2" "rbcLa" "ITS2" "rbcLa" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" [21] "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "ITS2" "rbcLa" "matK" "ITS2" "rbcLa" "ITS2" "matK" [41] "rbcLa" "ITS2" "matK" "ITS2" "rbcLa" "ITS2" "matK" "rbcLa" "rbcLa" "ITS2"
res3 <- bold_seqspec(taxon="Arabidopsis", marker=c("matK")) # the same problem res3$markercode
[1] "rbcLa" "matK" "matK" "rbcLa" "rbcLa" "matK" "matK" "rbcLa" "matK" "rbcLa" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" "matK" [24] "matK" "matK" "matK" "matK" "matK" "matK" "rbcL" "matK" "matK" "matK" "rbcL" "rbcLa" "ITS2" "matK" "ITS2" "rbcLa" "matK" "matK" "rbcLa" "ITS2" "matK" "rbcLa" "ITS2" [47] "matK" "rbcL" "rbcL" "matK" "matK" "rbcL" "rbcL" "matK" "rbcLa" "matK"