ropensci / piggyback

:package: for using large(r) data files on GitHub
https://docs.ropensci.org/piggyback
GNU General Public License v3.0
185 stars 26 forks source link

Unable to download resources without GitHub token on WIndows #77

Closed bschilder closed 2 years ago

bschilder commented 2 years ago

Hello,

During a recent check run by CRAN on our R package MAGMA.Celltyping, it was found that piggyback wasn't able to download assets from GitHub on Windows when a GH PAT wasn't supplied. I'm unable to supply a GH token since the checks are being run on CRAN's servers.

...
...
  Content type 'application/zip' length 511626945 bytes (487.9 MB)
  ==================================================
  downloaded 487.9 MB

  Unzipping file.
  i Downloading "NCBI37.3.gene.loc"...

    |                                                                            
    |                                                                      |   0%
    |                                                                            
    |======================================================================| 100%
  Saving full merged results to ==> D:\temp\Rtmp02KW2e/working_dir\Rtmp6pBFLr/MAGMA_celltyping..lvl1.csv
  Saving full merged results to ==> D:\temp\Rtmp02KW2e/working_dir\Rtmp6pBFLr/MAGMA_celltyping..lvl1.csv
  28 results @ FDR < 1
  [ FAIL 4 | WARN 12 | SKIP 0 | PASS 59 ]

  ══ Failed tests ════════════════════════════════════════════════════════════════
  ── Error (test-calculate_conditional_geneset_enrichment.R:4:9): calculate_conditional_geneset_enrichment works ──
  Error in `load(fileName)`: bad restore file magic number (file may be corrupted) -- no data loaded
  Backtrace:
      ▆
   1. └─MAGMA.Celltyping::get_ctd("ctd_allKI") at test-calculate_conditional_geneset_enrichment.R:4:8
   2.   └─MAGMA.Celltyping:::load_rdata(tmp)
   3.     └─base::load(fileName)
  ── Error (test-drop_genes_within_mhc.r:4:5): drop_genes_within_mhc works ───────
  Error in `get_data_check(tmp = tmp)`: piggyback::pb_download() failed due to bad GitHub credentials. Please add a GitHub Personal Access Token (PAT) to your ~/.Renviron file and restart R before retrying this function.
  e.g.:

      GITHUB_TOKEN=<your_PAT_here> 

  If you do not yet have a GitHub PAT, please follow instructions here:

      https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
  Backtrace:
      ▆
   1. └─MAGMA.Celltyping:::get_genomeLocFile(build = "GRCH37") at test-drop_genes_within_mhc.r:4:4
   2.   └─MAGMA.Celltyping:::get_data(...)
   3.     └─MAGMA.Celltyping:::get_data_check(tmp = tmp)
  ── Error (test-get_genomeLocFile.R:7:13): get_genomeLocFile works ──────────────
  Error in `get_data_check(tmp = tmp)`: piggyback::pb_download() failed due to bad GitHub credentials. Please add a GitHub Personal Access Token (PAT) to your ~/.Renviron file and restart R before retrying this function.
  e.g.:

      GITHUB_TOKEN=<your_PAT_here> 
...
...

This is reminiscent of a previous issue (#49), but appears to be only happening on Windows currently. The error message is coming from a check that I added when this was happening on all OS. I don't think this is showing up in your GHA checks bc I believe you're supplying your GHA PAT in as a global workflow variable.

This recently posted Issue seems to also be related: #76

Best, Brian

tanho63 commented 2 years ago

Reprex on Windows, I think?

testthat::with_mock(
  `gh::gh_token` = function()"",
  piggyback::pb_download(file = "timestamp.json",
                         dest = ".",
                         repo = "nflverse/nflverse-data",
                         tag = "misc"
                         )
)
ℹ Downloading "timestamp.json"...
  |======================================================================================| 100%
Warning message:
In gh_download_asset(df$owner[[1]], df$repo[[1]], id = df$id[i],  :
  Unauthorized (HTTP 401).

I'll look a bit closer when I get a chance!

tanho63 commented 2 years ago

@bschilder I just skimmed your get_data.R function - you might be able to get your package to CRAN sooner/without this update if you explicitly set .token = NULL in pb_download (or perhaps as the default argument on the outer wrapper) - it looks like the repo you want to access is public anyway

bschilder commented 2 years ago

Thanks for the quick reply @tanho63! I'll try the .token=NULL in the meantime, thanks for the suggestion.

Trying something like this:

.token <- gh::gh_token()
if(as.character(.token)=="") .token <- NULL
cboettig commented 2 years ago

@tanho63 should we queue a new CRAN release then?

tanho63 commented 2 years ago

@bschilder I'd probably do

get_data <- function(fname,
                     repo = "neurogenomics/MAGMA_Celltyping",
                     storage_dir = tempdir(),
                     overwrite = FALSE,
                     .token = NULL, # set new default token to NULL since your package repo is public
                     check = FALSE) {
        piggyback::pb_download(
            file = fname,
            dest = storage_dir,
            repo = repo,
            .token = .token, # use the token passed in from above with new default as NULL
            overwrite = overwrite
        )
    # etc 
}
tanho63 commented 2 years ago

@cboettig It's probably a good idea but unsure if it's too soon since last release