rstudio / packrat

Packrat is a dependency management system for R
http://rstudio.github.io/packrat/
401 stars 89 forks source link

using `packrat::restore` in GitLab CI Docker container; order of entries in `packrat.lock` matters. #373

Open edwardpmorris opened 7 years ago

edwardpmorris commented 7 years ago

My aim is to semi-reproduce the local (R) environment used for a complex analysis in a Docker container, so as to be able to share this with colleagues.

I am attempting to use packrat to produce a reproducible Docker container via GitLab CI ; I push packrat/packrat.lock, packrat/src, packrat/init.R and packrat/packrat.opts to the repository and build via GitLab CI; below is a snippet from my gitlab-ci.yaml file (set dry.run=TRUE is just for testing):

pages:
  stage: build
  image: rocker/hadleyverse:3.3.1
  script:
    - apt-get update && apt-get install -y tk8.5-dev tcl8.5-dev
    - R -e "0" --args --bootstrap-packrat
    - R -e 'packrat::restore(restart = FALSE, overwrite.dirty = TRUE, dry.run = TRUE)'

The issue is when using packrat::restore if a package has dependencies, and these are not entries in the list above the said package in the packrat.lock file, the build fails. For example, this fails with error The command failed with output: ERROR: dependencies 'pbkrtest' is not available for package 'car':

Package: car
Source: CRAN
Version: 2.1-4
Hash: d8cc2f66d4ba0c91312f1c1c5773d502

Package: pbkrtest
Source: CRAN
Version: 0.4-6
Hash: 8117c37e2e3078e3c6465134bac6956e

Manually editing the order of entries in packrat.lock appears to be a fix for this issue, however when adding a new package and using packrat::snapshot(), packrat.lock is rewritten in previous order, meaning it is inconvenient. For example this packrat.lock seems to work:

Package: pbkrtest
Source: CRAN
Version: 0.4-6
Hash: 8117c37e2e3078e3c6465134bac6956e

Package: car
Source: CRAN
Version: 2.1-4
Hash: d8cc2f66d4ba0c91312f1c1c5773d502

I am new to packrat, so not sure if this is a real issue or maybe I am using the wrong approach or missing something? Any suggestions or good practices for dealing with (many) R package dependencies for automated Docker builds would be appreciated.

regards Ed.

kevinushey commented 7 years ago

It looks like, for some reason, the dependencies of these packages are not being reported in the lockfile.

If I create a new project containing library(car) in one of the source files, and then attempt to snapshot, I see:

Package: car
Source: CRAN
Version: 2.1-4
Hash: d8cc2f66d4ba0c91312f1c1c5773d502
Requires: MASS, pbkrtest, quantreg

so I'm confused why you're not seeing the Requires: field in your case. How was the lockfile generated? What's the output of packrat::get_opts()?

edwardpmorris commented 7 years ago

Yes, confirm this was the issue, the Requires: field was not appearing in the lockfile.

I updated my R version (3.4) and user package library, and tried again:

So not sure what happened before, but packrat now creates the lockfile as expected and using restore in gitlab-ci works ok.

Thanks for the support and the neat solution for dealing with dependencies, it is a great step towards reproducible research.

> packrat::get_opts()
$auto.snapshot
[1] TRUE

$use.cache
[1] FALSE

$print.banner.on.startup
[1] "auto"

$vcs.ignore.lib
[1] TRUE

$vcs.ignore.src
[1] FALSE

$external.packages
character(0)

$local.repos
[1] "~/Documents"

$load.external.packages.on.startup
[1] TRUE

$ignored.packages
NULL

$quiet.package.installation
[1] TRUE

$snapshot.recommended.packages
[1] FALSE

$snapshot.fields
[1] "Imports"   "Depends"   "LinkingTo"