ropensci / bold

Interface to the Bold Systems barcode webservice
https://docs.ropensci.org/bold
Other
17 stars 11 forks source link

Fixes to output of bold_seq() and fasta_split() #80

Closed salix-d closed 2 years ago

salix-d commented 2 years ago

Description

Fixed the output od bold_seq() so it returns the gene marker instead of the id and keeps the accession number when present. Output is now a data.frame and the names of the columns match the bold API documentation. The @return in the bold_seq documentation was updated accordingly. Some code in split_fasta() seemed to be redundant and was removed/replaced. I tried optimizing both functions for time and memory usage. The test files for test-bold_seq.R, test-bold_identify_parent.R and README.Rmd files were updated to match the new output.

Related Issue

79

Example

library(bold)
res <- bold_seq(taxon='Coelioxys')
head(res)
     processid      identification        marker   accession 
[1,] "ACUFI1126-13" "Coelioxys rufescens" "COI-5P" "MZ607001"
[2,] "ACUFI1497-14" "Coelioxys elongata"  "COI-5P" "MZ627630"
[3,] "BBHYA529-12"  "Coelioxys"           "COI-5P" NA        
[4,] "BEECA511-06"  "Coelioxys modesta"   "COI-5P" NA        
[5,] "BEECA530-06"  "Coelioxys alternata" "COI-5P" NA        
[6,] "BEECA776-07"  "Coelioxys vigilans"  "COI-5P" NA               
     sequence                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
[1,] "TATTATATATATAATTTTTGCAATTTGATCAGGAATAATTGGATCTTCACTAAGAATAATTATTCGAATAGAATTAAGAATCCCAGGATCATGAATTAATAATGATATAATTTATAACTCTTTTATTACAGCTCATGCATTTTTAATAATTTTTTTTTTAGTTATACCTTTTTTAATTGGAGGATTTGGAAATTGATTAGCTCCTTTAATATTAGGAGCCCCAGATATAGCATTCCCTCGAATAAATAATATTAGATTTTGATTATTACCTCCTTCTTTATTAATATTATTAACTAGTAATTTAATTAATCCTAGACCAGGAACAGGATGAACAATTTATCCTCCTTTATCTTTATACAATTATCATCCTTCACCATCAGTAGATTTAGCAATTTTTTCTTTACATTTATCAGGAATATCATCTATTATTGGTTCAATAAATTTTATTGTAACAATTTTATTAATAAAAAATTATTCAATAAATTATAATCAAATACCTTTATTCCCATGATCAATTTTAATCACTACAATTTTATTATTATTATCTTTACCTGTTTTAGCAGGAGCAATTACAATATTATTATTTGATCGAAATTTAAATTCATCCTTTTTTGGCCCCTTAGGAGGAGGAGNTCCAATTTTATATCAACATTTATTT"   
[2,] "-------------------------------------ATTGGATCCTCATTAAGAATAATTATTCGAATAGAATTAAGAATTCCAGGATCTTGGATTAATAACGATCAAATTTATAACTCTTTTATTACAGCTCATGCATTTTTAATAATTTTTTTTTTAGTAATACCATTTTTAATTGGAGGATTTGGTAATTGATTAGCACCATTAATATTAGGAGCTCCTGATATAGCTTTCCCACGAATAAATAATATCAGATTTTGATTATTACCTCCTTCATTATTAATATTATTATCTAGTAATTTAATTTCACCTANACCAGGAACAGGATGAACAGTTTATCCACCATTATCATTATATACATATCATCCTTCCCCATCAGTTGATTTAGCAATTTTTTCTTTACATTTATCAGGAATTTCTTCTATTATCGGATCAATAAATTTTATTGTAACAATTTTAATAATA---AAAAATTATTCAATAAATTATAATCAAATACCTTTATTTCCATGATCAATTTTAATTACTACAATTTTATTATTATTATCATTACCTGTATTAGCAGGAGCTATTACAATATTATTATTTGATCGTAATTTAAATTCATCATTTTTTGACCCAATAGGAGGAGGAGATCCTATTTTATATCAACATTTATTT"
[3,] "AATAATATATATAATTTTTGCTATATGATCAGGAATGATTGGATCATCATTAAGAATAATTATTCGTATAGAATTAAGAACACCAGGTTCTTGAATTAATAATGATCAAATTTATAATTCATTTATTACAGCTCATGCATTTTTAATAATTTTTTTCCTAGTAATACCATTTTTAATTGGTGGATTTGGAAATTGATTAGTTCCATTAATAATTGGAGCTCCTGATATAGCATTCCCACGAATAAATAATATTAGATTTTGATTGTTACCTCCTTCACTATTAATATTACTTATAAGAAACTTCATTTCACCTAGACCAGGAACAGGATGAACTGTATATCCCCCATTATCATCATATAATTTTCATCCATCACCTTCAGTAGATATAGCTATTTTTTCCTTACATTTATCTGGTATTTCTTCAATTATTGGATCAATAAATTTTATTGTAACAATTTTAATAATAAAAAATTACTCATTAAATTATAGTAAAATATCTTTATTTCCTTGATCTATTTTAATTACAACAATTCTTTTATTATTATCTTTACCTGTTTTAGCAGGAGCAATTACAATATTACTTTTTGATCGTAATTTAAATACTTCATTTTTTGATCCAATAGGAGGAGGAGACCCAATTTTATACCAACATTTATTT"   
[4,] "-------------------------------GGTATAATTGGATCATCTTTAAGAATAATTATTCGCATAGAATTAAGAATCCCGGGTTCTTGAATTAACAATGATCAAATTTATAATTCTTTTATTACAGCTCATGCCTTTTTAATAATTTTTTTCCTAGTAATACCTTTTTTAATTGGTGGATTTGGTAATTGATTAGTACCTTTAATAATTGGAGCCCCAGATATAGCCTTCCCACGAATAAATAATATTAGATTTTGACTTTTACCCCCTTCTTTATTACTTTTATTATCAAGAAATTTAATTAATCCCAGACCTGGTACTGGATGAACAGTTTACCCACCTTTATCTTTATATAATTTTCATCCTTCTCCTTCAGTTGATTTAGCTATTTTTTCATTACATTTATCTGGAATCTCATCTATTATTGGATCAATAAATTTTATTGTTACTATTTTAATAATAAAAAATTTTTCATTAAATTATAGACAAATACCCTTATTCCCATGATCAGTTATAATTACTACAATCTTATTATTATTATCCTTACCAGTATTAGCAGGAGCAATTACAATATTATTATTTGATCGAAATTTTAATTCTTCATTTTTTGACCCAATAGGAGGAGGAGAC------------------------"   
[5,] "----------------------------------------GGATCATCATTAAGAATAATTATTCGTATAGAATTAAGAACTCCAGGTTCATGAATCAATAATGATCAAATTTATAATTCATTTATTACAGCCCATGCATTTTTAATAATCTTTTTCCTAGTAATACCATTTTTAATTGGTGGTTTTGGAAATTGATTAGTTCCTTTAATAATTGGAGCTCCTGATATAGCATTCCCACGAATAAATAATATTAGATTTTGATTATTACCTCCTTCACTGTTAATATTACTTATAAGAAATTTCATCTCACCTAGACCAGGAACAGGATGAACTGTATATCCTCCATTATCATTATACAATTTTCATCCTTCACCTTCAGTAGATATAGCTATTTTTTCCCTACATTTATCTGGAATTTCTTCAATTATTGGATCAATAAACTTTATTGTAACAATTTTAATAATAAAAAATTATTCATTAAATTATAGTAAAATATCTTTATTCCCATGATCTATTTTAATTACAACAATTCTTTTATTATTATCTTTACCTGTTTTAGCAGGAGCAATTACAATATTATTATTTGATCGTAATATAAATACTTCATTTTTTGACCCAATAGGAGGAGGAGAC------------------------"   
[6,] "CATCCTATATATAATTTTTGCCATATGGTCAGGAATAATTGGATCTTCATTAAGAATAATTATTCGTATAGAATTAAGAATCCCAGGCTCTTGGATTAGTAATGACCAAATTTATAATTCTTTTATTACTGCTCATGCATTTTTAATAATTTTTTTTTTAGTTATACCTTTCCTTATTGGAGGGTTTGGAAATTGATTAGTACCCTTAATAATTGGAGCTCCCGACATAGCATTCCCACGTATAAATAATGTTAGATTTTGATTATTACCACCATCTTTATTACTATTACTATCGAGAAATTTAATTAACCCAAGTCCTGGTACAGGATGAACAGTGTATCCCCCATTATCTTCTTATACATTTCATCCATCCCCATCAGTTGACTTAGCAATTTTTTCATTACATTTATCGGGTATTTCTTCTATTATTGGATCAATAAATTTTATTGTTACAATTTTAATAATAAAAAATTATTCTCTTAATTATAGACAAATACCATTATTCCCATGATCAGTTTTAGTCACTACAGTTTTATTACTTTTATCTTTACCAGTATTAGCTGGAGCAATCACAATATTATTATTTGATCGAAATTTAAATACATCATTTTTTGACCCAATAGGAGGAGGTGAC------------------------"

Tests

==> devtools::test()

i Loading bold
i Testing bold
v |  OK F W S | Context
v |  14       | bold_filter [1.6 s]                        
v |  13       | bold_identify [0.9 s]                      
v |  16       | bold_identify_parents [4.3 s]              
v |  18       | bold_seq [0.4 s]                           
v |  13       | bold_seqspec [2.6 s]                       
v |   7       | bold_specimens [1.6 s]                     
v |  13       | bold_stats [0.3 s]                         
v |  25       | bold_tax_id [0.8 s]                        
v |   9       | bold_tax_name [0.4 s]                      

== Results ================================================
Duration: 12.8 s

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 128 ]
sckott commented 2 years ago

Looks great, thanks!

By the way, I can no longer maintain this package. Are you interested?