Closed Jotanator closed 2 weeks ago
That is what it does. Well, actually, it's the information of the second entry on line 5 of the tsv returned by the BOLD API. The first entry has return characters in the 'copyright_licenses' field that messes up the format.
> records_bold_error <- bold_seqspec(taxon = "Molgula manhattensis", response = TRUE)
> tmp <- records_bold_error$content |> rawToChar() |> stringi::stri_split_lines1()
> stringi::stri_count_regex(tmp, "\t")
[1] 79 65 0 14 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79
[32] 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79
> tmp[1:4]
[1] "processid\tsampleid\trecordID\tcatalognum\tfieldnum\tinstitution_storing\tcollection_code\tbin_uri\tphylum_taxID\tphylum_name\tclass_taxID\tclass_name\torder_taxID\torder_name\tfamily_taxID\tfamily_name\tsubfamily_taxID\tsubfamily_name\tgenus_taxID\tgenus_name\tspecies_taxID\tspecies_name\tsubspecies_taxID\tsubspecies_name\tidentification_provided_by\tidentification_method\tidentification_reference\ttax_note\tvoucher_status\ttissue_type\tcollection_event_id\tcollectors\tcollectiondate_start\tcollectiondate_end\tcollectiontime\tcollection_note\tsite_code\tsampling_protocol\tlifestage\tsex\treproduction\thabitat\tassociated_specimens\tassociated_taxa\textrainfo\tnotes\tlat\tlon\tcoord_source\tcoord_accuracy\telev\tdepth\telev_accuracy\tdepth_accuracy\tcountry\tprovince_state\tregion\tsector\texactsite\timage_ids\timage_urls\tmedia_descriptors\tcaptions\tcopyright_holders\tcopyright_years\tcopyright_licenses\tcopyright_institutions\tphotographers\tsequenceID\tmarkercode\tgenbank_accession\tnucleotides\ttrace_ids\ttrace_names\ttrace_links\trun_dates\tsequencing_centers\tdirections\tseq_primers\tmarker_codes"
[2] "BNSB097-21\tBNSB0097\t14077558\t\tW121_CU\tDeutsches Zentrum fuer Marine Biodiversitaetsforschung\t\tBOLD:ACB4470\t18\tChordata\t61\tAscidiacea\t232\tStolidobranchia\t101156\tMolgulidae\t\t\t210801\tMolgula\t505893\tMolgula manhattensis\t\t\tWiebke Stamerjohanns\tMorphology, Barcoding\tDeKay, 1843\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t54.125\t8.855\t\t\t\t\t\t\tGermany\t\tBuesum\t\tTaoro boat\t7333897|7333898\thttp://www.boldsystems.org/pics/BNSB/W121-12.01.21+1610474376.jpg|http://www.boldsystems.org/pics/BNSB/W121-12.01.21-1+1610474270.jpg\tOverview|Overview\t|\tWiebke Stamerjohanns|Wiebke Stamerjohanns\t2022|2022\tCreativeCommons \x96 Attribution"
[3] "Non-Commercial Share-Alike|CreativeCommons \x96 Attribution"
[4] "Non-Commercial Share-Alike\tGerman Centre for Marine Biodiversity Research, Senckenberg am Meer|German Centre for Marine Biodiversity Research, Senckenberg am Meer\tWiebke Stamerjohanns|Wiebke Stamerjohanns\t15078645\tCOI-5P\t\tTACTTTATATTTTATTTTTGGTACATTCGCTGCATTAATTGGTTCCGCTTTGAGTGGAGTTTTGCGGTTAGAATTATCCCAAACAGGAGTTGTTATAATAAATAGCAATATGTATAATATAGTTATTACCTCTCATGCTTTAGTTATAATTTTTTTTTTTGTAATACCTATTACAATAAGGAGATTTGGGAATTGGCTAATTCCTCTTTTTATGAGATGTCCTGATATGGCTTTTCCTCGTATAAATAATTTTTCTTTTTGGTTACTTCCTTTTTCTTTTAGTTTATTATTACTTAGTGGTTTTATGAATATGAGAGTTGGGGCAGGGTGGACCATTTACCCTCCTCTATCTTCTATTTTGAGACATCCTAGAATTCAGATGGATTTTGCTATTTTTAGTCTACATTTGGCTAGAATTAGTAGTATTCTTTCTTCTATTAATTTTATAGTAACCATTTTAAATATATCTCCTAAAGGAATAAAAATTTTTCATTTATCTTTAATAATGTGAAGTATTTTTATTACAGCTGTTTTACTTTTATTATCATTACCAGTATTGGCTGGGGCCATTACTATGTTATTATTTGATCGTAATATTAATACTATGTTTTTTGATCCTGCAGGAGGGGGAGATCCAATCTTATTCCAACATCTCTTT\t\t\t\t\t\t\t\t"
I'll notify BOLD of this error. I know they are working on a new API, so I don't know if they'll fix it on this one.
I might be able to code a check to detect and fix those though.
Do you have other species names that return this error?
I don't have any others yet but I will let you know if I find any. I have a huge list of species at hand and that was one of them.
Hi! I just run into this same error with the names Jassa slatteryi and Molgula manhattensis, came here to see if someone else had seen this issue.
Hi @paulapappalardo and @Jotanator
Could one of you (or both) install the '104-incorrect-data-frame-column-names-for-molgula-manhattensis' branch to test if the fix I tried works for you?
remotes::install_github("ropensci/bold@104-incorrect-data-frame-column-names-for-molgula-manhattensis")
If so, I'll push the change to master! BOLD didn't get back to me, so for now I'll have to work around their API issues.
Done, and it works! I tested it for the two species I found it tripped, Jassa slatteryi and Molgula manhattensis (that you did the fix for). Thank you for the quick reply and great job with the fix 🙂
Thanks for testing 🙂
Using the BOLD API (latest stable version) to search for different species genus we noticed that for one of them we were running into errors. At first it seemed like an issue with missing columns in the data frame returned by the bold API. However, upon closer inspection I noticed that it isn't an issue of missing columns or missing data, the problem lies in the naming of the columns of the data frame.
Normally, when requesting a species such as gallus gallus using bold_seqspec function we get the following information:
records_bold <- bold_seqspec(taxon = "Gallus gallus")
However, when searching Molgula manhattensis we get the following:
records_bold_error <- bold_seqspec(taxon = "Molgula manhattensis")
Notice that all the columns are named incorrectly, for some reason it seems the names of the column are assigned the information of the first entry in Molgula manhattensis in BOLD.
Session Info
```r R version 4.2.3 (2023-03-15) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS 14.4 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] zip_2.3.1 treemapify_2.5.5 plotly_4.10.2 bold_1.3.0 mpoly_1.1.1 ipc_0.1.4 promises_1.2.1 [8] future_1.33.0 rlist_0.4.6.2 RSQLite_2.3.1 taxize_0.9.100 rentrez_1.2.3 shinyBS_0.61.1 modules_0.12.0 [15] shinyalert_3.0.0 shinydashboard_0.7.2 vembedr_0.1.5 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.3 [22] purrr_1.0.2 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.3 tidyverse_2.0.0 shinyWidgets_0.8.0 [29] shinycssloaders_1.0.0 shinyjs_2.1.0 shiny_1.7.5 loaded via a namespace (and not attached): [1] colorspace_2.1-0 ellipsis_0.3.2 httpcode_0.3.0 rstudioapi_0.15.0 listenv_0.9.0 urltools_1.7.3 ggfittext_0.10.1 [8] DT_0.29 bit64_4.0.5 fansi_1.0.4 mathjaxr_1.6-0 xml2_1.3.5 codetools_0.2-19 partitions_1.10-7 [15] cachem_1.0.8 polynom_1.4-1 jsonlite_1.8.7 compiler_4.2.3 httr_1.4.7 backports_1.4.1 fastmap_1.1.1 [22] lazyeval_0.2.2 cli_3.6.1 later_1.3.1 htmltools_0.5.6 tools_4.2.3 gmp_0.7-2 gtable_0.3.4 [29] glue_1.6.2 Rcpp_1.0.11 jquerylib_0.1.4 vctrs_0.6.3 crul_1.4.0 ape_5.7-1 nlme_3.1-162 [36] conditionz_0.1.0 iterators_1.0.14 crosstalk_1.2.0 globals_0.16.2 rbibutils_2.2.15 timechange_0.2.0 mime_0.12 [43] lifecycle_1.0.3 XML_3.99-0.14 zoo_1.8-12 scales_1.2.1 hms_1.1.3 parallel_4.2.3 yaml_2.3.7 [50] curl_5.0.2 memoise_2.0.1 sass_0.4.7 triebeard_0.4.1 stringi_1.7.12 foreach_1.5.2 orthopolynom_1.0-6.1 [57] filelock_1.0.2 Rdpack_2.5 rlang_1.1.1 pkgconfig_2.0.3 lattice_0.20-45 fontawesome_0.5.2 htmlwidgets_1.6.2 [64] bit_4.0.5 tidyselect_1.2.0 parallelly_1.36.0 plyr_1.8.8 magrittr_2.0.3 R6_2.5.1 generics_0.1.3 [71] base64url_1.4 txtq_0.2.4 DBI_1.1.3 pillar_1.9.0 withr_2.5.0 crayon_1.5.2 uuid_1.1-1 [78] utf8_1.2.3 tzdb_0.4.0 grid_4.2.3 data.table_1.14.8 blob_1.2.4 digest_0.6.33 xtable_1.8-4 [85] httpuv_1.6.11 munsell_0.5.0 viridisLite_0.4.2 bslib_0.5.1 ```