ropensci / piggyback

:package: for using large(r) data files on GitHub
https://docs.ropensci.org/piggyback
GNU General Public License v3.0
186 stars 27 forks source link

CRAN failure with piggyback as dependency #76

Closed 1beb closed 2 years ago

1beb commented 2 years ago

Trying to use piggyback in another package being submitted to CRAN. This may be related to #49. It happens during the Windows build but not during the Debian build.

https://win-builder.r-project.org/incoming_pretest/wru_1.0.0_20220618_014246/Windows/00check.log

ℹ Downloading "wru-data-census_last_c.rds"...

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
Warning in gh_download_asset(df$owner[[1]], df$repo[[1]], id = df$id[i],  :
  Unauthorized (HTTP 401).
ℹ Downloading "wru-data-first_c.rds"...

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
Warning in gh_download_asset(df$owner[[1]], df$repo[[1]], id = df$id[i],  :
  Unauthorized (HTTP 401).
ℹ Downloading "wru-data-last_c.rds"...

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
Warning in gh_download_asset(df$owner[[1]], df$repo[[1]], id = df$id[i],  :
  Unauthorized (HTTP 401).
ℹ Downloading "wru-data-mid_c.rds"...

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
Warning in gh_download_asset(df$owner[[1]], df$repo[[1]], id = df$id[i],  :
  Unauthorized (HTTP 401).
Error in readRDS(paste0(path, "/wru-data-first_c.rds")) : 
  unknown input format
1beb commented 2 years ago

I think the easiest approach here is to use \donttest{} on the examples that require the piggyback downloads so that it skips on cran.

bschilder commented 2 years ago

Hey @cboettig would you mind reopening this? I don't think this was really addressed, as the workaround @1beb provided won't work for anyone who needs to use piggyback as a core part of their package.

cboettig commented 2 years ago

Hmm, I don't think CRAN version of piggyback should be throwing a 401 error on public access any more. but it looks like you may have a windows path or serialization error potentially?

Error in readRDS(paste0(path, "/wru-data-first_c.rds")) : 
  unknown input format

I don't think your error is related to piggyback -- can you confirm whether it happens on other windows platforms? (The 401 messages are annoying and maybe spurious, but it looks to me they are not actually responsible for the error?)

Also, it may be worth noting that CRAN policies require that all package tests/vignettes/examples "fail gracefully" (i.e. don't fail, don't throw warnings or notes) whenever a resource is unavailable. So it's not clear to me what piggyback could change. I agree with @1beb that you will want to avoid testing these on CRAN. I recommend taking a look at the @examplesIf directive in roxygen, which will allow you to have your examples run conditionally (e.g. only if an env var NOT_CRAN=TRUE, or only if interactive(), etc). See https://roxygen2.r-lib.org/articles/rd.html

tanho63 commented 2 years ago

I identified the problem as related to https://github.com/ropensci/piggyback/blob/master/R/pb_download.R#L126 - a bad authentication token paired with the httr::write_disk() call will still write the content of the request to the filepath, even if the content is an error body - to wit:

x <- httr::GET("https://api.github.com/repos/nflverse/nflverse-data/releases/assets/68500509",
    httr::add_headers(Accept = "application/octet-stream"), 
    httr::add_headers(Authorization = paste("token", "")), 
    httr::write_disk("timestamp.txt", overwrite = TRUE)
  )
readLines("timestamp.txt") |> cat()
{   "message": "Bad credentials",   "documentation_url": "https://docs.github.com/rest" }

in other words, instead of the expected text file it writes the json body of the error to the text. This paired with a lack of raising any errors (i.e. it only raises a warning) causes the rest of the code to continue, and thus readRDS fails to read the input format (json)

either way, fixing the headers solves the download issue

1beb commented 2 years ago

For documentation purposes, I had to do a couple of things:

For testing locally I always had a GITHUB_PAT env var set in my .Rprofile. However, on CRAN that's not possible. CRAN maintainers recommended that I wrap these examples in \donttest{} and for tests that required a PAT use testthat::skip_on_cran(). For CI, using travis or github actions, you can add your PAT as a secret and then include it in the env.