rstudio / packrat

Packrat is a dependency management system for R
http://rstudio.github.io/packrat/
401 stars 89 forks source link

Internal R tar is not handling long filenames. #648

Closed galachad closed 2 years ago

galachad commented 2 years ago

Hi pacrat is using tar="internal". It's perfect solution for compatibility between different systems but internal R tar is not handling long directory names. https://github.com/rstudio/packrat/blob/3df1d62f77e98f846b11636b4a3203b925ec1028/R/restore.R#L481

Impact: RStudio products, Windows users

Example: Installing package from gitlab with long slug is not working. (https://gitlab.com/adam.forys/long-test-slug-skeleton)

Why it's happening?

Gitlab endpoint {gitlab_host}/api/v4/projects/{repository}/repository/archive?sha={sha} generate tar.gz file that contains directory created as repository-slug-sha-sha/ so it gave long-test-slug-skeleton-958296dbbbf7f1d82f7f5dd1b121c7558604809f-958296dbbbf7f1d82f7f5dd1b121c7558604809f and untar(tar="internal") returns error Error in file(name, "wb") : cannot open the connection.

Installing package by packrat for repository slug that has more that 16 characters will fail.

Because packrat is soft deprecated it's only impact RStudio Connect/shinyapps etc. (Rstudio product).

This problem is not an issue for renv as renv is detecting system tar.

My recommendation is replace "internal" with Sys.getenv("TAR", "internal").

untar(srczip, exdir = scratchDir, tar = Sys.getenv("TAR", "internal"))

After this change untar will be using system tar when set.

Question: Is any chance this change will be apply in next Rstudio Connect release?

aronatkins commented 2 years ago

Here is an example packrat.lock:

PackratFormat: 1.4
PackratVersion: 0.8.1
RVersion: 3.6.3
Repos: CRAN=https://cran.rstudio.com/

Package: skeleton
Source: gitlab
Version: 1.0.1
Hash: 612f8f6d5a930a63ffe057599cad2343
RemoteRepo: long-test-slug-skeleton
RemoteUsername: adam.forys
RemoteRef: HEAD
RemoteSha: 958296dbbbf7f1d82f7f5dd1b121c7558604809f
RemoteHost: gitlab.com

When restored:

packrat::restore()
# => Installing packrat (0.8.1) ... 
# =>    OK (built source)
# => Installing skeleton (1.0.1) ... 
# => Error in file(name, "wb") : cannot open the connection

The full trace:

Error in file(name, "wb") : cannot open the connection
11. file(name, "wb")
10. untar2(tarfile, files, list, exdir, restore_times)
9. untar(srczip, exdir = scratchDir, tar = "internal") at restore.R#482
8. withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))
7. suppressWarnings(untar(srczip, exdir = scratchDir, tar = "internal")) at restore.R#482
6. (function() {
    scratchDir <- tempfile()
    on.exit({
    if (file.exists(scratchDir)) unlink(scratchDir, recursive = TRUE) ... at restore.R#473
5. getSourceForPkgRecord(pkgRecord, srcDir(project), availablePackagesSource(repos = repos),
    repos, quiet = TRUE) at restore.R#735
4. installPkg(pkgRecord, project, repos, lib) at restore.R#854
3. playActions(pkgRecords, actions, repos, project, targetLib) at restore.R#956
2. restoreImpl(project, repos, packages, libDir, pkgsToIgnore = pkgsToIgnore,
    prompt = prompt, dry.run = dry.run, restart = restart) at packrat.R#410
1. packrat::restore()

This is the code we are effectively attempting to run:

archiveUrl <- "https://gitlab.com/api/v4/projects/adam.forys%2Flong-test-slug-skeleton/repository/archive?sha=958296dbbbf7f1d82f7f5dd1b121c7558604809f"
archive <- tempfile(fileext = ".tar.gz")
exdir <- tempfile()
packrat:::download(archiveUrl, destfile = archive, quiet = FALSE, mode = "wb")
untar(archive, exdir = exdir, tar = "internal")
# => Error in file(name, "wb") : cannot open the connection
# => In addition: Warning message:
# => In file(name, "wb") :
# =>   cannot open file 'long-test-slug-skeleton-958296dbbbf7f1d82f7f5dd1b121c7558604809f-958296dbbbf7f1d82f7f5dd1b121c7558604809f/': No such file or directory
aronatkins commented 2 years ago

The forced use of "internal" during restore came from the commit: https://github.com/rstudio/packrat/commit/017e9acee975f122011aad5080e6581599fdc413

aronatkins commented 2 years ago

Related: The forced use of "internal" when bundling came from the commit: https://github.com/rstudio/packrat/commit/0d03b0c85842cdaa5cb9c4e8a0d09d46dee58234

aronatkins commented 2 years ago

Chatted with @kevinushey a little. Folks can occasionally end up with a "bad" tar on their PATH (Windows). We may want to construct an approach like:

  1. Check Sys.getenv("TAR"). If it’s not “internal” or empty, use it.
  2. If we’re on a Unix machine, use the version of tar on the PATH. (Or, maybe just the one in /usr/bin?)
  3. If we’re on Windows, use the system tar if it exists.
  4. Otherwise, fall back to internal (maybe with a warning?)

Some related renv code: https://github.com/rstudio/renv/blob/2bea5cef7cb6c57e43580cf0a182cbe59ef3536a/R/tar.R#L2-L26 https://github.com/rstudio/renv/blob/d217845f2765877c4a192a4b1b4d419487db4b60/R/archive.R#L41-L53

galachad commented 2 years ago

In any case the R process should be responsible for setting correct path to the tar binary in TAR env variable. In my opinion when it's not set then internal should be used.

I have seen renv implementation and maybe it was connected with really old R version or other bug that R choosing wrong binary (probably already fixed).

I created PR for this change, it should be enough, it's only changing behavior of untar so not impact for any other part of pacrat.

toph-allen commented 2 years ago

It looks like the behavior forcing the use of internal came about in response to users on Windows often seeing buggy tar binaries detected by R's default behavior. However, the landscape has changed since then, and the long-filename bug isn't the only issue we've seen with the internal implementation. As such, we're going to rework the behavior to deprioritize the use of the internal implementation.

Per @aronatkins comment, we should respect a TAR environment variable if one is set set. Otherwise: On Windows, the default R untar() behavior of searching for a tar.exe on the PATH can sometimes surface buggy implementations, but Windows's built-in tar implementation is sound. On Unix, we'll look for a tar binary on the PATH.

@galachad Thanks for submitting your PR and pushing this forward. I'm going to use your PR as the basis for some additional changes to implement this behavior, and it'll be integrated into an upcoming Connect release.