Incorrect Data Frame Column Names for Molgula manhattensis #104

Jotanator commented 3 months ago

Using the BOLD API (latest stable version) to search for different species genus we noticed that for one of them we were running into errors. At first it seemed like an issue with missing columns in the data frame returned by the bold API. However, upon closer inspection I noticed that it isn't an issue of missing columns or missing data, the problem lies in the naming of the columns of the data frame.

Normally, when requesting a species such as gallus gallus using bold_seqspec function we get the following information:

records_bold <- bold_seqspec(taxon = "Gallus gallus")

However, when searching Molgula manhattensis we get the following:

records_bold_error <- bold_seqspec(taxon = "Molgula manhattensis")

Notice that all the columns are named incorrectly, for some reason it seems the names of the column are assigned the information of the first entry in Molgula manhattensis in BOLD.

salix-d commented 3 months ago

That is what it does. Well, actually, it's the information of the second entry on line 5 of the tsv returned by the BOLD API. The first entry has return characters in the 'copyright_licenses' field that messes up the format.

> records_bold_error <- bold_seqspec(taxon = "Molgula manhattensis", response = TRUE)
> tmp <- records_bold_error$content |> rawToChar() |> stringi::stri_split_lines1()
> stringi::stri_count_regex(tmp, "\t")
 [1] 79 65  0 14 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79
[32] 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79
> tmp[1:4]
[1] "processid\tsampleid\trecordID\tcatalognum\tfieldnum\tinstitution_storing\tcollection_code\tbin_uri\tphylum_taxID\tphylum_name\tclass_taxID\tclass_name\torder_taxID\torder_name\tfamily_taxID\tfamily_name\tsubfamily_taxID\tsubfamily_name\tgenus_taxID\tgenus_name\tspecies_taxID\tspecies_name\tsubspecies_taxID\tsubspecies_name\tidentification_provided_by\tidentification_method\tidentification_reference\ttax_note\tvoucher_status\ttissue_type\tcollection_event_id\tcollectors\tcollectiondate_start\tcollectiondate_end\tcollectiontime\tcollection_note\tsite_code\tsampling_protocol\tlifestage\tsex\treproduction\thabitat\tassociated_specimens\tassociated_taxa\textrainfo\tnotes\tlat\tlon\tcoord_source\tcoord_accuracy\telev\tdepth\telev_accuracy\tdepth_accuracy\tcountry\tprovince_state\tregion\tsector\texactsite\timage_ids\timage_urls\tmedia_descriptors\tcaptions\tcopyright_holders\tcopyright_years\tcopyright_licenses\tcopyright_institutions\tphotographers\tsequenceID\tmarkercode\tgenbank_accession\tnucleotides\ttrace_ids\ttrace_names\ttrace_links\trun_dates\tsequencing_centers\tdirections\tseq_primers\tmarker_codes"
[2] "BNSB097-21\tBNSB0097\t14077558\t\tW121_CU\tDeutsches Zentrum fuer Marine Biodiversitaetsforschung\t\tBOLD:ACB4470\t18\tChordata\t61\tAscidiacea\t232\tStolidobranchia\t101156\tMolgulidae\t\t\t210801\tMolgula\t505893\tMolgula manhattensis\t\t\tWiebke Stamerjohanns\tMorphology, Barcoding\tDeKay, 1843\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t54.125\t8.855\t\t\t\t\t\t\tGermany\t\tBuesum\t\tTaoro boat\t7333897|7333898\t|\tOverview|Overview\t|\tWiebke Stamerjohanns|Wiebke Stamerjohanns\t2022|2022\tCreativeCommons \x96 Attribution"                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
[3] "Non-Commercial Share-Alike|CreativeCommons \x96 Attribution"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     

I'll notify BOLD of this error. I know they are working on a new API, so I don't know if they'll fix it on this one.

I might be able to code a check to detect and fix those though.

salix-d commented 3 months ago

Do you have other species names that return this error?

Jotanator commented 3 months ago

I don't have any others yet but I will let you know if I find any. I have a huge list of species at hand and that was one of them.

paulapappalardo commented 2 weeks ago

Hi! I just run into this same error with the names Jassa slatteryi and Molgula manhattensis, came here to see if someone else had seen this issue.

salix-d commented 2 weeks ago

Hi @paulapappalardo and @Jotanator

Could one of you (or both) install the '104-incorrect-data-frame-column-names-for-molgula-manhattensis' branch to test if the fix I tried works for you?


If so, I'll push the change to master! BOLD didn't get back to me, so for now I'll have to work around their API issues.

paulapappalardo commented 2 weeks ago

Done, and it works! I tested it for the two species I found it tripped, Jassa slatteryi and Molgula manhattensis (that you did the fix for). Thank you for the quick reply and great job with the fix 🙂

salix-d commented 2 weeks ago

Thanks for testing 🙂