Closed tonyaseverson closed 4 years ago
@tfsevers88 Thanks for the report. The bz2
extension implies that the files are compressed with bzip2, but they are not. This must be a recent change since the code used to work.
I'll write the data maintainer and ask if they can either recompress the files or make the extensions consistent with the compression method.
In the meantime, I'll write a temporary workaround.
Actually, I think they may have just fixed it? The code runs now on my system ...
Delete your cached data (the folder onekp
downloaded), run the code again, and tell me how it goes.
Thanks, but I'm still getting similar errors. This time no directories are created and no files are downloaded.
Here is the reprex:
onekp <- retrieve_onekp()
seqs <- filter_by_code(onekp, c('URDJ','PDIE'))
download_peptides(seqs, 'oneKP/pep')
#> Warning in system(cmd, intern = TRUE): running command '/usr/bin/tar -tf
#> 'oneKP/pep/URDJ.faa.tar.bz2'' had status 1
#> Warning in untar(path, compressed = "bzip2", exdir = dir): '/
#> usr/bin/tar -xf 'oneKP/pep/URDJ.faa.tar.bz2' -C '/var/folders/
#> n9/67cpgppn3n91037xr9f6sfy80000gn/T//RtmpqY7nwy/onekp_sequences'' returned
#> error code 1
#> Warning in system(cmd, intern = TRUE): running command '/usr/bin/tar -tf
#> 'oneKP/pep/PDIE.faa.tar.bz2'' had status 1
#> Warning in untar(path, compressed = "bzip2", exdir = dir): '/
#> usr/bin/tar -xf 'oneKP/pep/PDIE.faa.tar.bz2' -C '/var/folders/
#> n9/67cpgppn3n91037xr9f6sfy80000gn/T//RtmpqY7nwy/onekp_sequences'' returned
#> error code 1
#> 6 3930
#> "oneKP/pep/URDJ.faa" "oneKP/pep/PDIE.faa"
download_nucleotides(seqs, 'oneKP/nuc')
#> Warning in system(cmd, intern = TRUE): running command '/usr/bin/tar -tf
#> 'oneKP/nuc/URDJ.fna.tar.bz2'' had status 1
#> Warning in untar(path, compressed = "bzip2", exdir = dir): '/
#> usr/bin/tar -xf 'oneKP/nuc/URDJ.fna.tar.bz2' -C '/var/folders/
#> n9/67cpgppn3n91037xr9f6sfy80000gn/T//RtmpqY7nwy/onekp_sequences'' returned
#> error code 1
#> Warning in system(cmd, intern = TRUE): running command '/usr/bin/tar -tf
#> 'oneKP/nuc/PDIE.fna.tar.bz2'' had status 1
#> Warning in untar(path, compressed = "bzip2", exdir = dir): '/
#> usr/bin/tar -xf 'oneKP/nuc/PDIE.fna.tar.bz2' -C '/var/folders/
#> n9/67cpgppn3n91037xr9f6sfy80000gn/T//RtmpqY7nwy/onekp_sequences'' returned
#> error code 1
#> 7 3931
#> "oneKP/nuc/URDJ.fna" "oneKP/nuc/PDIE.fna"
Created on 2019-08-22 by the reprex package (v0.3.0)
When I try to download the files from the onekp_public_data.html page, I see this:
I had coded a workaround, but it would work to download one file, but then seemed to time out on subsequent downloads. Perhaps the technical difficulties Google reports with virus scanning is the root cause?
You're right about the root problem. When I first wrote onekp
, the files were all served through FTP. Then the maintainers moved them to Google Drive.
We are probably going to need some cookies. There is a stackoverflow question that addresses this issue. Adapting the code from Tanaike:
#!/bin/bash
fileid="1GrB19Tl87zAbpqh3wgO8NCi9xR371MZq"
filename="data.tar.bz2"
url1="https://drive.google.com/uc?export=download&id=${fileid}"
echo $url1
curl -c cookie -s -L $url1 > /dev/null
url2="https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' cookie`&id=${fileid}"
echo $url2
curl -Lb cookie $url2 -o ${filename}
This seems to work. We can implement this solution in R using the RCurl
library (e.g. see this solution) If you like, you can try to get this working and submit a pull request. Alternatively, I can come back to it sometime next week (I'm booked till Monday, at least).
@arendsee: I'm a newbie to R, and haven't done package development before, so I'm not sure how far I would get and I'm running out of time before classes resume to make progress on other things. This isn't a blocker for me - I manually downloaded what I needed. I'd be happy to test if that would be of use, though, and will watch this issue.
Another problem in the same vein, however now it appears some identifiers yield 403 - Forbidden, as if google drive mapping was off (checked in CyVerse, both example identifiers available in public dataset).
> seqs2 <- filter_by_code(onekp, c('MYMP', 'ZSSR'))
> download_peptides(seqs2, 'pep2')
trying URL 'https://drive.google.com/uc?export=download&id=111S43yNcrFvDwCA9Gr0IZ9dv9FZrMCiS'
downloaded 74 KB
bzip2: (stdin) is not a bzip2 file.
/bin/tar: Child returned status 2
/bin/tar: Error is not recoverable: exiting now
bzip2: (stdin) is not a bzip2 file.
/bin/tar: Child returned status 2
/bin/tar: Error is not recoverable: exiting now
trying URL 'https://drive.google.com/uc?export=download&id=1xFt4gVlvbhSVK7-FLfTkfYckP_Sq57kk'
Content type 'application/x-bzip2' length 46 bytes
==================================================
downloaded 46 bytes
3264 3300
"pep2/ZSSR.faa" "pep2/MYMP.faa"
Warning messages:
1: In system(cmd, intern = TRUE) :
running command '/bin/tar -tf 'pep2/ZSSR.faa.tar.bz2'' had status 2
2: In untar(path, compressed = "bzip2", exdir = dir) :
‘/bin/tar -xf 'pep2/ZSSR.faa.tar.bz2' -C '/tmp/Rtmp7aEZy9/onekp_sequences'’ returned error code 2
@gkoczyk I'll check up on this. The shell script above seems to still work. I'll see if I can implement the same behavior in R.
@gkoczyk My last commit should have fixed the problem. If not, you may reopen the issue. My fix may have broken Windows compatibility.
It ends up all the cookie shenanigans were quite unnecessary, all I needed to add to the curl
command was the -L option that allows redirects to be followed. But now I use a system call to curl
, which is probably not portable.
This is the minimal code to recreate the issue
Created on 2019-08-21 by the reprex package (v0.3.0)
Session info
``` r devtools::session_info() #> ─ Session info ────────────────────────────────────────────────────────── #> setting value #> version R version 3.5.1 (2018-07-02) #> os macOS 10.14.5 #> system x86_64, darwin15.6.0 #> ui X11 #> language (EN) #> collate en_CA.UTF-8 #> ctype en_CA.UTF-8 #> tz America/Vancouver #> date 2019-08-21 #> #> ─ Packages ────────────────────────────────────────────────────────────── #> package * version date lib #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.4 2019-04-10 [1] #> bit 1.1-14 2018-05-29 [1] #> bit64 0.9-7 2017-05-08 [1] #> blob 1.2.0 2019-07-09 [1] #> callr 3.3.1 2019-07-18 [1] #> cli 1.1.0 2019-03-19 [1] #> crayon 1.3.4 2017-09-16 [1] #> curl 4.0 2019-07-22 [1] #> DBI 1.0.0 2018-05-02 [1] #> dbplyr 1.4.2 2019-06-17 [1] #> desc 1.2.0 2018-05-01 [1] #> devtools 2.1.0 2019-07-06 [1] #> digest 0.6.20 2019-07-04 [1] #> dplyr 0.8.3 2019-07-04 [1] #> evaluate 0.14 2019-05-28 [1] #> fs 1.3.1 2019-05-06 [1] #> glue 1.3.1 2019-03-12 [1] #> highr 0.8 2019-03-20 [1] #> hoardr 0.5.2 2018-12-02 [1] #> htmltools 0.3.6 2017-04-28 [1] #> httr 1.4.1 2019-08-05 [1] #> knitr 1.23 2019-05-18 [1] #> magrittr 1.5 2014-11-22 [1] #> memoise 1.1.0 2017-04-21 [1] #> onekp * 0.2.2 2019-08-21 [1] #> pillar 1.4.2 2019-06-29 [1] #> pkgbuild 1.0.3 2019-03-20 [1] #> pkgconfig 2.0.2 2018-08-16 [1] #> pkgload 1.0.2 2018-10-29 [1] #> prettyunits 1.0.2 2015-07-13 [1] #> processx 3.4.1 2019-07-18 [1] #> ps 1.3.0 2018-12-21 [1] #> purrr 0.3.2 2019-03-15 [1] #> R6 2.4.0 2019-02-14 [1] #> rappdirs 0.3.1 2016-03-28 [1] #> Rcpp 1.0.2 2019-07-25 [1] #> remotes 2.1.0 2019-06-24 [1] #> rlang 0.4.0 2019-06-25 [1] #> rmarkdown 1.14 2019-07-12 [1] #> rprojroot 1.3-2 2018-01-03 [1] #> RSQLite 2.1.2 2019-07-24 [1] #> rvest 0.3.4 2019-05-15 [1] #> selectr 0.4-1 2018-04-06 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> stringi 1.4.3 2019-03-12 [1] #> stringr 1.4.0 2019-02-10 [1] #> taxizedb 0.1.9.9130 2019-08-21 [1] #> testthat 2.2.1 2019-07-25 [1] #> tibble 2.1.3 2019-06-06 [1] #> tidyselect 0.2.5 2018-10-11 [1] #> usethis 1.5.1 2019-07-04 [1] #> vctrs 0.2.0 2019-07-05 [1] #> withr 2.1.2 2018-03-15 [1] #> xfun 0.8 2019-06-25 [1] #> xml2 1.2.2 2019-08-09 [1] #> yaml 2.2.0 2018-07-25 [1] #> zeallot 0.1.0 2018-01-28 [1] #> source #> CRAN (R 3.5.1) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.1) #> CRAN (R 3.5.0) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.0) #> Github (ropensci/onekp@6eace96) #> CRAN (R 3.5.2) #> CRAN (R 3.5.1) #> CRAN (R 3.5.0) #> CRAN (R 3.5.0) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> Github (ropensci/taxizedb@8ee0ab9) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.2) #> CRAN (R 3.5.2) #> CRAN (R 3.5.0) #> CRAN (R 3.5.0) #> #> [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library ```