metrumresearchgroup / pkgr

R package installation and management - reimagined.
https://metrumresearchgroup.github.io/pkgr/docs
39 stars 4 forks source link

Incompatibility with default library location for renv v0.15 #392

Closed kyleam closed 2 years ago

kyleam commented 2 years ago

renv v0.15 comes with the following change:

renv now uses an external library by default for R package projects, with the library located within tools::R_user_dir("renv", "cache"). This directory can be configured via the RENV_PATHS_LIBRARY_ROOT environment variable if desired. See vignette("packages", package = > "renv") for more details. (#384)

This new default doesn't play well with pkgr's assumption that the library is under $(project_root)/renv/.../.

https://github.com/metrumresearchgroup/pkgr/blob/e8fb3aa731a8d73065a1265c96f6bbc583ba70af/configlib/config.go#L104-L110

With earlier versions of renv, the pkgr logic could be thrown off by setting RENV_PATHS_LIBRARY to a location other than the default library under the project renv/. However, with v0.15, this incompatible state is now the default.

Below the fold is a demo of the issue that shows that a pkgr-installed library can't be located when using renv v0.15.0.

demo Note: this script requires `usethis` and `renv` (before v0.15) to be available in default R environment. ```sh #!/bin/sh set -eux tdir=$(mktemp -d "${TMPDIR:-/tmp}"/fakepkgXXXXXX) cd "$tdir" Rscript -e 'usethis::create_package(".")' >/dev/null 2>&1 git init -q && git add . && git commit -qm'create package' ### pkgr setup ######################################################### printf "%s\n" "$(pkgr --version)" cat >pkgr.yml <<'EOF' Version: 1 Descriptions: - DESCRIPTION Packages: - httr2 Repos: - CRAN: https://cran.rstudio.com Lockfile: Type: renv EOF git add pkgr.yml && git commit -qm'add pkgr.yml' ### renv 0.14.0 ######################################################## Rscript -e 'renv::init(bare = TRUE)' git add . && git commit -qm'renv setup' Rscript -e 'packageVersion("renv")' -e '.libPaths()' pkgr install >/dev/null 2>&1 Rscript -e "httr2::request" ### renv 0.15.0 ######################################################## # Note that the failure here is a database error that isn't relevant. Rscript -e 'renv::upgrade(version = "0.15.0")' 2>/dev/null || : git clean -qxfd Rscript -e 'renv::init(bare = TRUE)' Rscript -e 'packageVersion("renv")' -e '.libPaths()' pkgr install >/dev/null 2>&1 Rscript -e "httr2::request" || : libpath=$(Rscript -e 'cat(.libPaths()[1])') find "$libpath" | grep httr2 || echo 'No httr2 in lib path' find "./renv/" | grep -c httr2 ``` ``` + mktemp -d /tmp/fakepkgXXXXXX + tdir=/tmp/fakepkgrW0wUk + cd /tmp/fakepkgrW0wUk + Rscript -e usethis::create_package(".") + git init -q + git add . + git commit -qmcreate package + pkgr --version + printf %s\n v3.0.0-rc1 v3.0.0-rc1 + cat + git add pkgr.yml + git commit -qmadd pkgr.yml + Rscript -e renv::init(bare = TRUE) + git add . + git commit -qmrenv setup + Rscript -e packageVersion("renv") -e .libPaths() [1] ‘0.14.0’ [1] "/tmp/fakepkgrW0wUk/renv/library/R-4.1/x86_64-pc-linux-gnu" [2] "/tmp/RtmpnN2W6W/renv-system-library" + pkgr install + Rscript -e httr2::request function(base_url) { new_request(base_url) } + Rscript -e renv::upgrade(version = "0.15.0") Installing renv [0.15.0] ... OK [linked cache] + git clean -qxfd + Rscript -e renv::init(bare = TRUE) # Bootstrapping renv 0.15.0 -------------------------------------------------- * Downloading renv 0.15.0 ... OK (downloaded source) * Installing renv 0.15.0 ... Done! * Successfully installed and loaded renv 0.15.0. + Rscript -e packageVersion("renv") -e .libPaths() [1] ‘0.15.0’ [1] "/data/home/kylem/.cache/R/renv/library/fakepkgrW0wUk-c4660a42/R-4.1/x86_64-pc-linux-gnu" [2] "/opt/R/4.1.1/lib/R/library" + pkgr install + Rscript -e httr2::request Error in loadNamespace(x) : there is no package called ‘httr2’ Calls: loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart Execution halted + : + Rscript -e cat(.libPaths()[1]) + libpath=/data/home/kylem/.cache/R/renv/library/fakepkgrW0wUk-c4660a42/R-4.1/x86_64-pc-linux-gnu + find /data/home/kylem/.cache/R/renv/library/fakepkgrW0wUk-c4660a42/R-4.1/x86_64-pc-linux-gnu + grep httr2 + echo No httr2 in lib path No httr2 in lib path + find ./renv/ + grep -c httr2 99 ```
seth127 commented 2 years ago

I still don't totally have my head around this, but I wanted to bring this up: the main reason we (at Metrum) use pkgr and renv together is so that renv can isolate the R environment and each project can know that it's using only the packages from the isolated environment (previously always within <project>/renv/library/).

Does this new tools::R_user_dir("renv", "cache") pattern break that? In other words, if we make pkgr work with tools::R_user_dir("renv", "cache") then are we essentially disabling the environment isolation? Or is renv still keeping track of which version of a given package from renv/cache it wants at a project level?

If the new pattern does disable that isolation, we could look into manipulating RENV_PATHS_LIBRARY (probably in the .Rprofile of our project template).

kyleam commented 2 years ago

if we make pkgr work with tools::R_user_dir("renv", "cache") then are we essentially disabling the environment isolation?

It's still isolated (the subdirectory in the cache is named by hashing the full project path). The motivation for storing it externally rather than as an embedded directory is to avoid slowing down R CMD build (which copies to a temporary directory).

dpastoor commented 2 years ago

I think the primary motivation was speed (for the cache behavioral change). Its been a slow rollout of going to cache by default - when renv itself sets up a cache - during its install process it now uses symlinks (or junction points on windows - one of the sticking points that took so long to go all-in on caches, combined with all sorts of edge-y cases like https://community.rstudio.com/t/using-renv-on-network-drive-all-packages-are-copied-both-in-project-library-and-cache/83265). They're going more and more in on that global cache. Kinda similar to what rsconnect is doing when mantaining its global cache then mostly symlinking packages in when a new version of an app is deployed with same packages.

I think there is an old pkgr issue around the same consideration - basically pkgrs cache is still only the cache of the binary/src - there could be room for binary/src/installedpkg --> where pkg installs could leverage that global cache. Its been on the todo list for a while, but just not enough of a pain point to get around to the refactoring/testing/etc to implement, especially when installing from binary is generally quick enough.

Two final cents - As I watched this build up - a consideration for you all to think about is ... really the value of renv was they got the whole bootstrapping/environment isolation down pretty well (with a small +1 for being able to generate lockfiles for a client if needed). AKA all this new stuff around making renv better/more performant for installing doesn't matter. One simple answer could be to drop renv - nab out the specific isolation code renv uses and just have that as a standalone script sourced by the .Rprofile. pkgr will still need to be taught for community purposes prob some more care around interacting with renv, but that might be the path of lowest friction for metrum.

seth127 commented 2 years ago

Thanks for the input @dpastoor. I should've known you had your eye on this as it was developing. As you note, the isolation is the important part for us and, since it looks like that still works, we're gonna see if we can knock this out with a conditional patch in that section of the pkgr code. Thanks again for the context.

kyleam commented 2 years ago

@dpastoor:

AKA all this new stuff around making renv better/more performant for installing doesn't matter.

Hmm, this comment may be targeted more at the wider cache changes you're talking about, but, for this particular case, I do think it matters even when pkgr is doing the installing. The change that this issue is about (moving the project libraries to an external location) was motivated by an issue about the embedded renv/library/ slowing down R CMD build:

So, if we're developing a package with an embedded library, regardless of whether renv or pkgr is setting it up, R CMD build will suffer from the same slowness. (Of course, someone can decide that the simplicity of the embedded directory is worth the cost of unnecessary library copying.)

dpastoor commented 2 years ago

ah! this is for when someone is using renv within the package context - eg if renv detects that you have a .Rproj and that project itself is a package, IIRC its something like detect that there is a DESCRIPTION file at that directory level as well.

I did not read your reprex carefully, so this is not for all renv behavior for 0.15, just when developing a package.

kyleam commented 2 years ago

IIRC its something like detect that there is a DESCRIPTION file at that directory level as well.

Yep: https://github.com/rstudio/renv/commit/2de06e378afe92b0acf764856f4e9201918acd64#diff-befe9146792f0d65aeeaf781e8ba00a1a3638ac2be162b61b6c8958440860b56R599

I did not read your reprex carefully, so this is not for all renv behavior for 0.15, just when developing a package.

Yes, sorry, I should have highlighted that point rather than just quoting renv's NEWS item.

kyleam commented 2 years ago

I've proposed a fix in gh-396.