ropensci / bold

Interface to the Bold Systems barcode webservice
https://docs.ropensci.org/bold
Other
17 stars 11 forks source link

Odd EOF warning with bold_seqspec #87

Closed cjfields closed 2 years ago

cjfields commented 2 years ago

I'm seeing the following sporadically when using bold_seqspec:

tmp <- bold_seqspec('Calliphoridae')
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string

Very similar to this SO issue.

Session Info
R version 4.1.3 (2022-03-10)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.8   taxize_0.9.99 bold_1.2.0   

loaded via a namespace (and not attached):
 [1] zoo_1.8-9         tidyselect_1.1.2  xfun_0.30         purrr_0.3.4       lattice_0.20-45   vctrs_0.3.8       generics_0.1.2    htmltools_0.5.2   yaml_2.3.5       
[10] utf8_1.2.2        rlang_1.0.2       pillar_1.7.0      httpcode_0.3.0    glue_1.6.2        DBI_1.1.2         uuid_1.0-3        foreach_1.5.2     lifecycle_1.0.1  
[19] plyr_1.8.6        stringr_1.4.0     codetools_0.2-18  evaluate_0.15     knitr_1.37        fastmap_1.1.0     parallel_4.1.3    curl_4.3.2        fansi_1.0.2      
[28] urltools_1.7.3    triebeard_0.3.0   Rcpp_1.0.8        jsonlite_1.8.0    digest_0.6.29     stringi_1.7.6     grid_4.1.3        cli_3.2.0         tools_4.1.3      
[37] magrittr_2.0.2    tibble_3.1.6      crul_1.2.0        crayon_1.5.0      ape_5.6-2         pkgconfig_2.0.3   ellipsis_0.3.2    data.table_1.14.2 xml2_1.3.3       
[46] assertthat_0.2.1  rmarkdown_2.13    reshape_0.8.9     iterators_1.0.14  R6_2.5.1          conditionz_0.1.0  nlme_3.1-155      compiler_4.1.3 
cjfields commented 2 years ago

I should clarify: I see this issue sporadically with different calls to bold_seqspec(), I see this consistently w/ the above example.

tadeu95 commented 2 years ago

I've run into this issue you are probably not retrieving all public available for Calliphoridae, because of this warning. Try downloading the data set from the BOLD API and confirm if you're losing records From what I've gathered that warning happens when trying to read records with symbols such as "#", and I haven't been able to circumvent it when using the bold package. It happens sometimes when reading files as well, but in that case it's easy to solve

Jotanator commented 2 years ago

I have also run into this issue with some species such as Felis Catus, I checked the records when it happened and I am indeed missing some of them. In the case of Felis Catus, I am missing the last three records with process IDs: RDATC037-05, RONP027-14, and RSMS002-11. As far as I can tell there doesn't seem to be anything weird with those 3 records and I don't really see any "#" symbols in them either.

cjfields commented 2 years ago

Per the SO question, disabling quotes helps. Here is a demo using the full response instance:

> test <- bold_seqspec('Felix catus', response = TRUE)
> tt <- paste0(rawToChar(tmp$content, multiple = TRUE), collapse = "")
> Encoding(tt) <- "UTF-8"
> temp1 <- utils::read.delim(text = tt, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string
> temp2 <- utils::read.delim(text = tt, header = TRUE, sep = "\t", stringsAsFactors = FALSE, quote = "")
> dim(temp1)
[1] 41 80
> dim(temp2)
[1] 44 80

The second data frame (temp2) has three more rows and no errors.

cjfields commented 2 years ago

Fix now merged!