ropensci / bold

Interface to the Bold Systems barcode webservice
https://docs.ropensci.org/bold
Other
17 stars 11 forks source link

bold_specimens returning poorly formatted data.frame with option format="tsv" #46

Closed griffinp closed 7 years ago

griffinp commented 7 years ago

I am using bold::bold_specimens within a custom function to return specimen records for a given taxon in "tsv" format. I am pretty sure this worked fine in the past for me, however now the resulting data.frame is not well formatted. The header line partly wraps into the first row of the data.frame, resulting in a dodgy row 1 and extra 'NA' entries in other rows that shouldn't be there.

Example:

specimen_table <- bold::bold_specimens(taxon="Anaspididae", format="tsv")
specimen_table
    processid   sampleid           recordID     catalognum    fieldnum
1   image_ids image_urls copyright_licenses      trace_ids trace_links
2 GBCM0002-06   AF048821             468923                           
3                                                                     
4 GBCM0381-06   DQ310660             501348 HBLB 047 (BIO)            
5                                                                     
6  RBGC001-03   MaAna000               4901       MaAna000            
7                                                                     
                institution_storing            bin_uri phylum_taxID phylum_name  class_taxID
1                         run_dates sequencing_centers   directions seq_primers marker_codes
2          Mined from GenBank, NCBI       BOLD:AAF3961           20  Arthropoda           69
3                                                                                           
4          Mined from GenBank, NCBI       BOLD:AAF3962           20  Arthropoda           69
5                                                                                           
6 Biodiversity Institute of Ontario       BOLD:AAF3961           20  Arthropoda           69
7                                                                                           
    class_name order_taxID  order_name family_taxID family_name subfamily_taxID subfamily_name
1                       NA                       NA                          NA             NA
2 Malacostraca         352 Anaspidacea         1697 Anaspididae              NA             NA
3                       NA                       NA                          NA             NA
4 Malacostraca         352 Anaspidacea         1697 Anaspididae              NA             NA
5                       NA                       NA                          NA             NA
6 Malacostraca         352 Anaspidacea         1697 Anaspididae              NA             NA
7                       NA                       NA                          NA             NA
  genus_taxID genus_name species_taxID        species_name subspecies_taxID subspecies_name
1          NA                       NA                                   NA              NA
2        5694  Anaspides          8241 Anaspides tasmaniae               NA              NA
3          NA                       NA                                   NA              NA
4        5694  Anaspides          8241 Anaspides tasmaniae               NA              NA
5          NA                       NA                                   NA              NA
6        5694  Anaspides          8241 Anaspides tasmaniae               NA              NA
7          NA                       NA                                   NA              NA
  identification_provided_by voucher_type tissue_type collectors collectiondate lifestage sex
1                         NA           NA          NA         NA             NA        NA  NA
2                         NA           NA          NA         NA             NA        NA  NA
3                         NA           NA          NA         NA             NA        NA  NA
4                         NA           NA          NA         NA             NA        NA  NA
5                         NA           NA          NA         NA             NA        NA  NA
6                         NA           NA          NA         NA             NA        NA  NA
7                         NA           NA          NA         NA             NA        NA  NA
  reproduction           extrainfo notes lat lon coord_source coord_accuracy country province
1           NA                        NA  NA  NA           NA             NA      NA       NA
2           NA                        NA  NA  NA           NA             NA      NA       NA
3           NA                        NA  NA  NA           NA             NA      NA       NA
4           NA                        NA  NA  NA           NA             NA      NA       NA
5           NA                        NA  NA  NA           NA             NA      NA       NA
6           NA Anaspides tasmaniae    NA  NA  NA           NA             NA      NA       NA
7           NA                        NA  NA  NA           NA             NA      NA       NA
  region exactsite  X
1     NA        NA NA
2     NA        NA NA
3     NA        NA NA
4     NA        NA NA
5     NA        NA NA
6     NA        NA NA
7     NA        NA NA

> str(specimen_table)
'data.frame':   7 obs. of  42 variables:
 $ processid                 : chr  "image_ids" "GBCM0002-06" "" "GBCM0381-06" ...
 $ sampleid                  : chr  "image_urls" "AF048821" "" "DQ310660" ...
 $ recordID                  : chr  "copyright_licenses" "468923" "" "501348" ...
 $ catalognum                : chr  "trace_ids" " " "" "HBLB 047 (BIO)" ...
 $ fieldnum                  : chr  "trace_links" " " "" " " ...
 $ institution_storing       : chr  "run_dates" "Mined from GenBank, NCBI" "" "Mined from GenBank, NCBI" ...
 $ bin_uri                   : chr  "sequencing_centers" "BOLD:AAF3961" "" "BOLD:AAF3962" ...
 $ phylum_taxID              : chr  "directions" "20" "" "20" ...
 $ phylum_name               : chr  "seq_primers" "Arthropoda" "" "Arthropoda" ...
 $ class_taxID               : chr  "marker_codes" "69" "" "69" ...
 $ class_name                : chr  "" "Malacostraca" "" "Malacostraca" ...
 $ order_taxID               : int  NA 352 NA 352 NA 352 NA
 $ order_name                : chr  "" "Anaspidacea" "" "Anaspidacea" ...
 $ family_taxID              : int  NA 1697 NA 1697 NA 1697 NA
 $ family_name               : chr  "" "Anaspididae" "" "Anaspididae" ...
 $ subfamily_taxID           : logi  NA NA NA NA NA NA ...
 $ subfamily_name            : logi  NA NA NA NA NA NA ...
 $ genus_taxID               : int  NA 5694 NA 5694 NA 5694 NA
 $ genus_name                : chr  "" "Anaspides" "" "Anaspides" ...
 $ species_taxID             : int  NA 8241 NA 8241 NA 8241 NA
 $ species_name              : chr  "" "Anaspides tasmaniae" "" "Anaspides tasmaniae" ...
 $ subspecies_taxID          : logi  NA NA NA NA NA NA ...
 $ subspecies_name           : logi  NA NA NA NA NA NA ...
 $ identification_provided_by: logi  NA NA NA NA NA NA ...
 $ voucher_type              : logi  NA NA NA NA NA NA ...
 $ tissue_type               : logi  NA NA NA NA NA NA ...
 $ collectors                : logi  NA NA NA NA NA NA ...
 $ collectiondate            : logi  NA NA NA NA NA NA ...
 $ lifestage                 : logi  NA NA NA NA NA NA ...
 $ sex                       : logi  NA NA NA NA NA NA ...
 $ reproduction              : logi  NA NA NA NA NA NA ...
 $ extrainfo                 : chr  "" " " "" " " ...
 $ notes                     : logi  NA NA NA NA NA NA ...
 $ lat                       : logi  NA NA NA NA NA NA ...
 $ lon                       : logi  NA NA NA NA NA NA ...
 $ coord_source              : logi  NA NA NA NA NA NA ...
 $ coord_accuracy            : logi  NA NA NA NA NA NA ...
 $ country                   : logi  NA NA NA NA NA NA ...
 $ province                  : logi  NA NA NA NA NA NA ...
 $ region                    : logi  NA NA NA NA NA NA ...
 $ exactsite                 : logi  NA NA NA NA NA NA ...
 $ X                         : logi  NA NA NA NA NA NA ...

Any help would be very welcome! Thanks in advance.

Session info here:

> devtools::session_info()
Session info --------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.0 (2017-04-21)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.0.143)           
 language (EN)                        
 collate  en_AU.UTF-8                 
 tz       Australia/Melbourne         
 date     2017-07-19                  

Packages ------------------------------------------------------------------------------------------
 package       * version    date       source                                
 ape             4.1        2017-02-14 CRAN (R 3.4.0)                        
 assertthat      0.2.0      2017-04-11 CRAN (R 3.4.0)                        
 backports       1.0.5      2017-01-18 CRAN (R 3.4.0)                        
 base          * 3.4.0      2017-04-21 local                                 
 bindr           0.1        2016-11-13 cran (@0.1)                           
 bindrcpp        0.2        2017-06-17 cran (@0.2)                           
 bold            0.4.0      2017-01-06 CRAN (R 3.4.0)                        
 codetools       0.2-15     2016-10-05 CRAN (R 3.4.0)                        
 commonmark      1.2        2017-03-01 CRAN (R 3.4.0)                        
 compiler        3.4.0      2017-04-21 local                                 
 crayon          1.3.2      2016-06-28 CRAN (R 3.4.0)                        
 curl            2.7        2017-06-26 cran (@2.7)                           
 data.table      1.10.4     2017-02-01 CRAN (R 3.4.0)                        
 datasets      * 3.4.0      2017-04-21 local                                 
 desc            1.1.0      2017-01-27 CRAN (R 3.4.0)                        
 devtools      * 1.13.2     2017-06-02 CRAN (R 3.4.0)                        
 digest          0.6.12     2017-01-27 CRAN (R 3.4.0)                        
 dplyr           0.7.1      2017-06-22 cran (@0.7.1)                         
 foreach         1.4.3      2015-10-13 CRAN (R 3.4.0)                        
 git2r           0.18.0     2017-01-01 CRAN (R 3.4.0)                        
 glue            1.1.1      2017-06-21 cran (@1.1.1)                         
 graphics      * 3.4.0      2017-04-21 local                                 
 grDevices     * 3.4.0      2017-04-21 local                                 
 grid            3.4.0      2017-04-21 local                                 
 httr            1.2.1      2016-07-03 CRAN (R 3.4.0)                        
 iterators       1.0.8      2015-10-13 CRAN (R 3.4.0)                        
 jsonlite        1.5        2017-06-01 cran (@1.5)                           
 lattice         0.20-35    2017-03-25 CRAN (R 3.4.0)                        
 magrittr        1.5        2014-11-22 CRAN (R 3.4.0)                        
 memoise         1.1.0      2017-04-21 CRAN (R 3.4.0)                        
 metabarcodedb * 0.0.0.9000 2017-07-19 local (griffinp/metabarcodedb@b70ec4c)
 methods       * 3.4.0      2017-04-21 local                                 
 nlme            3.1-131    2017-02-06 CRAN (R 3.4.0)                        
 parallel        3.4.0      2017-04-21 local                                 
 pbapply         1.3-3      2017-07-04 CRAN (R 3.4.1)                        
 pkgconfig       2.0.1      2017-03-21 cran (@2.0.1)                         
 plyr            1.8.4      2016-06-08 CRAN (R 3.4.0)                        
 R6              2.2.2      2017-06-17 cran (@2.2.2)                         
 Rcpp            0.12.11    2017-05-22 cran (@0.12.11)                       
 rentrez         1.1.0      2017-06-01 cran (@1.1.0)                         
 reshape         0.8.6      2016-10-21 CRAN (R 3.4.0)                        
 reshape2        1.4.2      2016-10-22 CRAN (R 3.4.0)                        
 rlang           0.1.1      2017-05-18 cran (@0.1.1)                         
 roxygen2      * 6.0.1      2017-02-06 CRAN (R 3.4.0)                        
 rprojroot       1.2        2017-01-16 CRAN (R 3.4.0)                        
 stats         * 3.4.0      2017-04-21 local                                 
 stringi         1.1.5      2017-04-07 CRAN (R 3.4.0)                        
 stringr         1.2.0      2017-02-18 CRAN (R 3.4.0)                        
 taxize          0.8.8      2017-07-01 cran (@0.8.8)                         
 testthat        1.0.2      2016-04-23 CRAN (R 3.4.0)                        
 tibble          1.3.3      2017-05-28 cran (@1.3.3)                         
 tools           3.4.0      2017-04-21 local                                 
 utils         * 3.4.0      2017-04-21 local                                 
 withr           1.0.2      2016-06-20 CRAN (R 3.4.0)                        
 XML             3.98-1.9   2017-06-19 cran (@3.98-1.)                       
 xml2            1.1.1      2017-01-24 CRAN (R 3.4.0)     
sckott commented 7 years ago

thanks @griffinp for this report - does appear to be a bug - will fix tomorrow

sckott commented 7 years ago

@griffinp please reinstall devtools::install_github("ropensci/bold") should work now - had to switch to their new API