rstudio / renv

renv: Project environments for R.
https://rstudio.github.io/renv/
MIT License
1.02k stars 155 forks source link

failed to add "xtable" to `renv.lock` after copying cache to project library with `renv::isolate()` and reinstall + snapshot #208

Closed philipp-baumann closed 5 years ago

philipp-baumann commented 5 years ago

Dear Kevin, I'm a big fan of renv and I'm using it in combination with drake and Docker to ensure reproducibility for my scientific projects and foster collaboration. I'm at the moment experimenting with the Docker configuration option 1 you nicely describe in Using renv with Docker. Because some of the packages are shared via cache and not in the project library, they are not listed in renv.lock. As a consequence, they were not installed when building the Docker image.

To this purpose I tried using renv::isolate() introduced in 4ca213faf29eabc1a38fc24f8f6d51f9a5d7ce27 to move these packages to the project library. I executed this on my local machine to prepare the new renv.lock for Docker.

Unfortunately, it failed and I couldn't figure out why it cannot move renv:

library(renv)
#> 
#> Attaching package: 'renv'
#> The following object is masked from 'package:stats':
#> 
#>     update
#> The following objects are masked from 'package:utils':
#> 
#>     history, upgrade
#> The following objects are masked from 'package:base':
#> 
#>     load, remove

isolate()
#> * Copying packages into the private library ... [77/155] [78/155] [79/155] [80/155] [81/155] [82/155] [83/155] [84/155] [85/155] [86/155] [87/155] [88/155] [89/155] [90/155] [91/155] [92/155] [93/155] [94/155] [95/155] [96/155] [97/155] [98/155] [99/155] [100/155] [101/155] [102/155] [103/155] [104/155] [105/155] [106/155] [107/155] [108/155] [109/155] [110/155] [111/155] [112/155] [113/155] [114/155] [115/155] [116/155] [117/155] [118/155] [119/155] [120/155] [121/155] [122/155] [123/155] [124/155] [125/155] [126/155] [127/155] [128/155] [129/155] [130/155] [131/155] [132/155] [133/155] [134/155] [135/155] [136/155] [137/155] [138/155] [139/155] [140/155] [141/155]
#> Error: source file '/home/baumanph/.local/share/renv/cache/v4/R-3.6/x86_64-pc-linux-gnu/renv/0.7.0-111/ac1651d05df95351d3918bd15c14c673/renv' does not exist

.libPaths()
#> [1] "/media/ssd/nas-ethz/doktorat/projects/01_spectroscopy/52_swr-spc/renv/library/R-3.6/x86_64-pc-linux-gnu"
#> [2] "/tmp/RtmpfJMfUR/renv-system-library"                                                                    
#> [3] "/usr/lib/R/library"

Created on 2019-10-06 by the reprex package (v0.3.0)

I'm using renv v0.7.0-111, which I also installed in the global library of my local machine. I have the following in my .Rprofile:

source("renv/activate.R")
options(renv.config.snapshot.timeout = 30)

The Dockerfile is in this gist.

Thanks a lot for the great work done here and some hints to resolve the issue. Best, Philipp

philipp-baumann commented 5 years ago

Now I just restarted R, reinstalled with renv::install("rstudio/renv@0.7.0-111"), but "xtable" is still not in the renv.lock. Sorry forgot to post this file in last comment, but here it is.

library(renv)
#> 
#> Attaching package: 'renv'
#> The following object is masked from 'package:stats':
#> 
#>     update
#> The following objects are masked from 'package:utils':
#> 
#>     history, upgrade
#> The following objects are masked from 'package:base':
#> 
#>     load, remove

isolate()

purge("xtable")
#> * The requested package is not installed in the cache -- nothing to do.

install("xtable")
#> Retrieving 'https://cloud.r-project.org/src/contrib/xtable_1.8-4.tar.gz' ...
#>  OK [file is up to date]
#> Installing xtable [1.8-4] from CRAN ...
#>  OK (built from source)

.libPaths()
#> [1] "/media/ssd/nas-ethz/doktorat/projects/01_spectroscopy/52_swr-spc/renv/library/R-3.6/x86_64-pc-linux-gnu"
#> [2] "/tmp/RtmpGYUwmf/renv-system-library"                                                                    
#> [3] "/usr/lib/R/library"

Created on 2019-10-06 by the reprex package (v0.3.0)

renv should't use the cache for this project (this command failed in reprex::reprex()):

> renv::settings$use.cache()
[1] FALSE
> renv::snapshot()
* The lockfile is already up to date.

However, using reprex::reprex(), I get

library(renv)
#> 
#> Attaching package: 'renv'
#> The following object is masked from 'package:stats':
#> 
#>     update
#> The following objects are masked from 'package:utils':
#> 
#>     history, upgrade
#> The following objects are masked from 'package:base':
#> 
#>     load, remove

snapshot()
#> The following required packages are not installed:
#> 
#>  reshape          [required by GGally]
#>  reshape2         [required by broom, caret, Cubist, and 2 others]
#>  rio              [required by car]
#>  robCompositions  [required by mvoutlier]
#>  robustbase       [required by cvTools, fpc, mvoutlier]
#>  rprojroot        [required by here]
#>  scales           [required by cowplot, ggplot2]
#>  sgeostat         [required by mvoutlier]
#>  sp               [required by maptools]
#>  SparseM          [required by quantreg]
#>  SQUAREM          [required by lava]
#>  testthat         [required by hyperSpec]
#>  tibble           [required by broom, cellranger, dbplyr, and 10 others]
#>  tidyr            [required by broom, modelr, recipes, and 2 others]
#>  tidyselect       [required by dbplyr, dplyr, recipes, and 2 others]
#>  tidyverse        [required by simplerspec]
#>  timeDate         [required by recipes]
#>  vctrs            [required by hms, pillar]
#>  viridisLite      [required by ggplot2]
#>  XML              [required by hyperSpec]
#>  zoo              [required by lmtest]
#> 
#> Consider re-installing these packages before snapshotting the lockfile.
#> Error in snapshot(): aborting snapshot due to pre-flight validation failure

Created on 2019-10-06 by the reprex package (v0.3.0)

Am I missing anything?

kevinushey commented 5 years ago

Because some of the packages are shared via cache and not in the project library, they are not listed in renv.lock. As a consequence, they were not installed when building the Docker image.

I don't quite follow -- these concepts are somewhat independent. Packages enter the lockfile if they are (1) installed in the project library, and (2) used somewhere in the project. Packages installed in the library may either be 'real' package installs, or may be symlinks back into the renv package cache. So whether a package is used from the cache or not should not affect whether a package enters the lockfile.

The error you reported looks like a bug in renv, though -- the isolation code was assuming that we'd always be able to find a package in the cache to copy back to the library, but that may not always be true. I've pushed a candidate fix to master.

kevinushey commented 5 years ago

My best guess: the reprex() example is failing because the library paths are different from your "regular" R session versus the reprex() session.

Note that here:

.libPaths()
#> [1] "/media/ssd/nas-ethz/doktorat/projects/01_spectroscopy/52_swr-spc/renv/library/R-3.6/x86_64-pc-linux-gnu"
#> [2] "/tmp/RtmpfJMfUR/renv-system-library"                                                                    
#> [3] "/usr/lib/R/library"

I would not expect to see /usr/lib/R/library on the library paths. Could that be related?

philipp-baumann commented 5 years ago

I don't quite follow -- these concepts are somewhat independent. Packages enter the lockfile if they are (1) installed in the project library, and (2) used somewhere in the project. Packages installed in the library may either be 'real' package installs, or may be symlinks back into the renv package cache. So whether a package is used from the cache or not should not affect whether a package enters the lockfile.

Thanks for the clarification. Sorry for my wrong statement. I somehow accidentally confused things because renv was not behaving as expected, although key design principles (1) and (2) can be deduced from the renv introduction. Maybe an explicit "symlink" mention could be incorporated into this starting resource?

I just upgraded to renv v0.7.0-129 using

> renv::upgrade(version = "0.7.0-129")
A new version of the renv package will be installed:

    [0.6.0-61] -> [0.7.0-129]

This project will use the newly-installed version of renv.

I'm not sure if this bug was actually related to the problem described here, but nice you fixed that anyway! I checked again and "xtable" is located in the renv project library, but still not in renv.lock after another round of

> renv::install("xtable")
Retrieving 'https://cloud.r-project.org/src/contrib/xtable_1.8-4.tar.gz' ...
    OK [file is up to date]
Installing xtable [1.8-4] ...
    OK (built from source)
> renv::snapshot()
* The lockfile is already up to date.

Also, xtable is used in the R scripts in the project directory used for drake::code_to_plan(). This is the updated gist for renv.lock.

My best guess: the reprex() example is failing because the library paths are different from your "regular" R session versus the reprex() session.

Note that here:

.libPaths()
#> [1] "/media/ssd/nas-ethz/doktorat/projects/01_spectroscopy/52_swr-spc/renv/library/R-3.6/x86_64-pc-linux-gnu"
#> [2] "/tmp/RtmpfJMfUR/renv-system-library"                                                                    
#> [3] "/usr/lib/R/library"

I would not expect to see /usr/lib/R/library on the library paths. Could that be related?

Correct, in the project directory I get:

> .libPaths()
[1] "/media/ssd/nas-ethz/doktorat/projects/01_spectroscopy/52_swr-spc/renv/library/R-3.6/x86_64-pc-linux-gnu"
[2] "/tmp/RtmpQwcIHo/renv-system-library"                                                                    

Is this really inteded behaviour of reprex::reprex() in renv project context to show different .libPaths() compared to a "regular" R session? If yes, it's a bit confusing because that would compromise the core idea of reprex reproducibility.

BTW isolate() is missing in the renv pagedown reference. Would be a nice addition there because function is exported.

Thanks a lot for your help!

kevinushey commented 5 years ago

In essence, packages will enter the lockfile if they're reported as part of the packages in:

renv::dependencies()

I suspect that some packages are being used in your project in a way that renv fails to discover. Can you share your project sources, so I can see exactly how xtable is declared / used in your project? We might need to add support for how packages might be referenced or used in drake pipelines.

The behavior with reprex is likely a bug -- most likely, I will have to write a PR to reprex to see if renv environments can be explicitly supported.

philipp-baumann commented 5 years ago

Ah I see, as you say that might indeed be related to the way I load packages; here is the _setup-run-all.R that loads packages and functions prior to planning the drake workflow and building the targets:


## Load packages
pkgs <- c("here", "drake", "tidyverse", "data.table",
  "simplerspec", "caret", "Cubist", "rsample", "nls.multstart",
  "broom", # modeling
  "future", "future.apply", "doParallel", "doFuture", # asynchronous computation
  "gghighlight", "grid", "gridExtra", "cowplot", # graphics
  "xtable") # tables
purrr::walk(pkgs, library, character.only = TRUE)

Also, as an example of a script of the workflow 70_collect-swr-params.R. xtable is loaded in l. 534 for example.

Let me know if you want to see the rest of the project (cannot share the data publicly as I don't own, but could invite you to the private repo).

kevinushey commented 5 years ago

That would explain it! Unfortunately renv dependency discovery system is not nearly smart enough to understand this.

If you rewrite your package usages with another form, e.g. plain old

library(here)
library(drake)
< ... >

then renv will be able to pick it up.

jennybc commented 5 years ago

The behavior with reprex is likely a bug -- most likely, I will have to write a PR to reprex to see if renv environments can be explicitly supported.

If I reprex::reprex() this code:

getwd()
.libPaths()

inside an renv-using project, I see:

getwd()
#> [1] "/private/var/folders/yx/3p5dt4jj1019st0x90vhm9rr0000gn/T/RtmpVGI5y1/reprexecfb5cc375e3"
.libPaths()
#> [1] "/Users/jenny/rrr/stat545/renv/library/R-3.6/x86_64-apple-darwin15.6.0"                  
#> [2] "/private/var/folders/yx/3p5dt4jj1019st0x90vhm9rr0000gn/T/RtmpVGI5y1/renv-system-library"
#> [3] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library"

Created on 2019-10-07 by the reprex package (v0.3.0)

Which seems correct.

@philipp-baumann Can you say more about what you're seeing?

philipp-baumann commented 5 years ago

That would explain it! Unfortunately renv dependency discovery system is not nearly smart enough to understand this.

If you rewrite your package usages with another form, e.g. plain old

library(here)
library(drake)
< ... >

then renv will be able to pick it up.

Sure, that makes sense. Thanks! This now works :-) (maybe nice for a future version; I'd have to dig in more into the code base of renv to be able to contribute with code; maybe some time in future)

> renv::snapshot()
The following package(s) will be updated in the lockfile:

# CRAN ===============================
- xtable   [* -> 1.8-4]

Do you want to proceed? [y/N]: y
* Lockfile written to '/media/ssd/nas-ethz/doktorat/projects/01_spectroscopy/52_swr-spc/renv.lock'.
philipp-baumann commented 5 years ago

The behavior with reprex is likely a bug -- most likely, I will have to write a PR to reprex to see if renv environments can be explicitly supported.

If I reprex::reprex() this code:

getwd()
.libPaths()

inside an renv-using project, I see:

getwd()
#> [1] "/private/var/folders/yx/3p5dt4jj1019st0x90vhm9rr0000gn/T/RtmpVGI5y1/reprexecfb5cc375e3"
.libPaths()
#> [1] "/Users/jenny/rrr/stat545/renv/library/R-3.6/x86_64-apple-darwin15.6.0"                  
#> [2] "/private/var/folders/yx/3p5dt4jj1019st0x90vhm9rr0000gn/T/RtmpVGI5y1/renv-system-library"
#> [3] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library"

Created on 2019-10-07 by the reprex package (v0.3.0)

Which seems correct.

@philipp-baumann Can you say more about what you're seeing?

Thanks for jumping in @jennybc

Here is what I see:

getwd()
#> [1] "/tmp/Rtmphm0Ct4/reprexeed9548c51"

.libPaths()
#> [1] "/media/ssd/nas-ethz/doktorat/projects/01_spectroscopy/52_swr-spc/renv/library/R-3.6/x86_64-pc-linux-gnu"
#> [2] "/tmp/Rtmphm0Ct4/renv-system-library"                                                                    
#> [3] "/usr/lib/R/library"

Created on 2019-10-07 by the reprex package (v0.3.0)

kevinushey commented 5 years ago

The point is that the default system library (.Library) should not be showing up on the library paths. That is:

#> [3] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library"

should not be there when the renv sandbox is activated. Note that this magic is done by renv when the session is launched through the .Rprofile, so if reprex is launching a child process with e.g. --vanilla, that would probably explain why this is happening.

kevinushey commented 5 years ago

It looks like reprex launches its child processes with callr::r_safe():

https://github.com/tidyverse/reprex/blob/f888a72a90f39c0dcb9564b6d94fee094b6fd342/R/reprex.R#L403

which intentionally does not load the .Rprofile. So, I believe this is ultimately just renv doing something that reprex did not / could not anticipate, since so much of the renv startup magic happens in the project .Rprofile.

jennybc commented 5 years ago

But the project's .Rprofile does seem to have been consulted? Otherwise, I don't understand how/why both @philipp-baumann and I have the first 2 lib paths that we have. I don't know why we both have the system library in the 3rd position 🤔

jennybc commented 5 years ago

The docs for callr::r_safe() (which is now just an alias for callr::r(), but that wasn't true when I first started using it) talk about the system and user .Rprofile. But they're pretty silent about a project-level .Rprofile.

kevinushey commented 5 years ago

I believe this is because callr::r_safe() does pass along the current library paths, e.g.

> .libPaths(c(tempdir(), .libPaths()))
> callr::r_safe(function() { print(.libPaths()) })
[1] "/private/var/folders/b4/2422hswx71z8mgwtv4rhxchr0000gn/T/RtmpP8GKVV" "/Users/kevinushey/Library/R/3.6/library"                            
[3] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library"     

But renv does some black magic to mutate .Library and .Library.site (sandboxing them so user packages installed in these libraries are not visible to renv projects), and that magic does not propagate by default.

kevinushey commented 5 years ago

Perhaps more to the point:

> owd <- setwd(tempdir())
> writeLines("x <- 42", ".Rprofile")
> callr::r_safe(function() { print(x) })
Error: callr subprocess failed: object 'x' not found
> callr::r_copycat(function() { print(x) })
[1] 42
jennybc commented 5 years ago

I can imagine how to create something worthy of the name reprex_renv(), meaning reprex this code HERE in this renv-using project. But I'm not sure if the demand is high enough to justify it? In any case, if that seems worth contemplating, we could track it over an issue on reprex.

kevinushey commented 5 years ago

I think the underlying issue here is now understood + resolved (renv's dependency discovery machinery failing to understand the way packages were loaded in this project).

Also worth stating:

Also, as an example of a script of the workflow 70_collect-swr-params.R . xtable is loaded in l. 534 for example.

In that example, the function xtable() is used, but the package itself is not referenced or loaded. In other words, renv doesn't really know (just from static analysis) that xtable is a function that is being provided by the xtable package. You could also qualify the usage; e.g.

xtable::xtable(...)

and in this case renv would detect that usage.

philipp-baumann commented 5 years ago

Great, thanks @kevinushey and @jennybc for digging into it and giving detailed infos!