rstudio / packrat

Packrat is a dependency management system for R
http://rstudio.github.io/packrat/
402 stars 90 forks source link

binary package .tar.gz retained in packrat/lib #598

Closed aronatkins closed 4 years ago

aronatkins commented 4 years ago

Given a packrat/packrat.lock:

PackratFormat: 1.4
PackratVersion: 0.4.9.3
RVersion: 3.5.1
Repos: CRAN=https://demo.rstudiopm.com/all/__linux__/bionic/latest

Package: crayon
Source: CRAN
Version: 1.3.4
Hash: ff2840dd9b0d563fc80377a5a45510cd

Restoring this environment retains the downloaded .tar.gz file within the package directory.

ls -l packrat/lib/x86_64-pc-linux-gnu/3.6.3/crayon/*.tar.gz
-rw-r--r--  1 aron  staff  744826 Apr  9 14:39 packrat/lib/x86_64-pc-linux-gnu/3.6.3/crayon/crayon_1.3.4.tar.gz

R script to reproduce this issue; set use_cache <- TRUE to see the same problem when using a package cache.

# Install packrat from CRAN; unnecessary in Linux environments with packrat installed.
install.packages("packrat", repos=c(CRAN="https://cran.rstudio.com/"))

use_cache <- FALSE
if (use_cache) {
  # remove the existing cache in case we are rerunning
  cache_path <- file.path(normalizePath("."), "cache")
  unlink(cache_path, recursive = TRUE)
  dir.create(cache_path)

  Sys.setenv(R_PACKRAT_CACHE_DIR = cache_path)
}

# remove the existing lib in case we are rerunning
packrat_lib_path <- normalizePath("packrat/lib", mustWork = FALSE)
unlink(packrat_lib_path, recursive = TRUE)

Sys.setenv(R_PACKRAT_SRC_DIR = tempdir())
options(packrat.untrusted.packages = character())

options(
  packrat.verbose.cache = TRUE,
  packrat.connect.timeout = 10
)
packrat::set_opts(
  auto.snapshot = FALSE,
  use.cache = use_cache,
  project = getwd(),
  persist = FALSE
)
packrat::restore(
  overwrite.dirty = TRUE,
  prompt = FALSE,
  restart = FALSE
)

Run with Docker (so we can use the Linux binaries from RSPM):

docker run -it --rm -v $(pwd):/content -w /content rstudio/r-base:3.6-bionic R --slave -f restore.R
kevinushey commented 4 years ago

It looks like this issue manifests specifically because the Packrat src directory is set to the R temporary directory:

Sys.setenv(R_PACKRAT_SRC_DIR = tempdir())

The problem with doing this is that, when Packrat installs a package, it first decompresses the package sources into the temporary directory, and then builds and installs from those unpacked sources. I believe this is mostly legacy behavior done to protect against invalid archives that needed to be "fixed up" in various ways; e.g. correcting permissions on a package configure script.

The simplest workaround, I think, would be to use a sub-directory of the R temporary directory as the Packrat source directory. A similar fix could be applied to decompressImpl() here:

https://github.com/rstudio/packrat/blob/90520e2247f65b2b9fb21bfa804444bd5ab2b78c/R/install.R#L252

That is, instead of decompressing into tempdir() we could use tempfile("packrat-decompress-") or something.

tl;dr: ultimately this is a bug in Packrat; it can be fixed here but I think it could be easily worked around as well.

aronatkins commented 4 years ago

Modifying the original script to perform the following avoids .tar.gz in the cache:

srcDir <- file.path(tempdir(), "packrat-src")
Sys.setenv(R_PACKRAT_SRC_DIR = srcDir)
aronatkins commented 4 years ago

I can confirm that #599 avoids .tar.gz in the cache by replacing the install.packages with:

install.packages("remotes", repos=c(CRAN="https://cran.rstudio.com/"))
remotes::install_github("rstudio/packrat", ref = "bugfix/packrat-decompress-path")