rstudio / renv

renv: Project environments for R.
https://rstudio.github.io/renv/
MIT License
1.02k stars 155 forks source link

`renv::install` from cellar regression #1852

Closed hutch3232 closed 7 months ago

hutch3232 commented 7 months ago

For context my R environment is behind a firewall, so I don't have access to the outside internet. However, I do have access to a mirror of CRAN.

I use the very convenient renv cellar to store in-house packages as well as a handful of external packages that rely on external dependencies. For the latter, I have to do the build outside of the firewall and then place them in the cellar for later use.


What I found is that renv::install recognizes the package in my cellar, and says it will be installed from there. Then, for some reason it goes to CRAN and downloads the tarball from there and installs that one, instead of mine in the cellar. This is a problem because my tarball has all the source dependencies bundled, per: https://arrow.apache.org/docs/r/reference/create_package_with_all_dependencies.html

Sys.setenv(ARROW_S3="ON", ARROW_DEPENDENCY_SOURCE="BUNDLED", ARROW_OFFLINE_BUILD="true")
renv::install("arrow@15.0.1", repos = NULL, prompt = FALSE)
# - Package arrow [15.0.1] will be installed from the cellar.
# # Downloading packages -------------------------------------------------------
# - Downloading arrow from Repository ...         OK [4.2 Mb in 1.3s]
# Successfully downloaded 1 package in 4.5 seconds.
# 
# The following package(s) will be installed:
# - arrow [15.0.1]
# These packages will be installed into "/mnt/code/renv/library/R-4.3/x86_64-pc-linux-gnu".
# 
# # Installing packages --------------------------------------------------------
# - Installing arrow ...                          OK [built from source and cached in 26m]
# Successfully installed 1 package in 26 minutes.

I know it says it was successful, but if you run arrow::arrow_info() I see that it did not install all the optional dependencies, like S3 support which is included in my bundle.

I attempted to prevent renv from accessing CRAN (my mirror):

options(repos = NULL)
Sys.setenv(ARROW_S3="ON", ARROW_DEPENDENCY_SOURCE="BUNDLED", ARROW_OFFLINE_BUILD="true")
renv::install("arrow@15.0.1", repos = NULL, prompt = FALSE)
# - Package arrow [15.0.1] will be installed from the cellar.
# # Downloading packages -------------------------------------------------------
# Warning: failed to find source for 'arrow 15.0.1' in package repositories
# Error: failed to retrieve package 'arrow@15.0.1'

Another thing I tried to do was use the renv shim of install.packages. This used to work for me but had some odd behavior this time:

install.packages(file.path(renv::paths$root(), "cellar", "arrow", "arrow_15.0.1.tar.gz"), repos = NULL, type = "source")
# or
install.packages("/mnt/imported/data/renv/cellar/arrow/arrow_15.0.1.tar.gz", repos = NULL, type = "source")
# Error: failed to resolve remote '/mnt/imported/data/renv/cellar/arrow/arrow_15.0.1.tar.gz' -- failed to parse remote spec # '/mnt/imported/data/renv/cellar/arrow/arrow_15.0.1.tar.gz'
# 15.
# stop(simpleError(message = message, call = e$call)) at remotes.R#57
# 14.
# h(simpleError(msg, call))
# 13.
# .handleSimpleError(function (e)
# {
# fmt <- "failed to resolve remote '%s'"
# prefix <- sprintf(fmt, spec) ...
# 12.
# stop(sprintf(fmt, ...), call. = call.) at utils-format.R#3
# 11.
# stopf("failed to parse remote spec '%s'", spec) at remotes.R#324
# 10.
# renv_remotes_parse(spec) at remotes.R#71
# 9.
# renv_remotes_resolve_impl(spec, latest) at remotes.R#62
# 8.
# withCallingHandlers(renv_remotes_resolve_impl(spec, latest),
# error = error) at remotes.R#62
# 7.
# FUN(X[[i]], ...)
# 6.
# lapply(x, f, ...) at utils-map.R#57
# 5.
# map(packages, renv_remotes_resolve) at install.R#166
# 4.
# renv::install("/mnt/imported/data/renv/cellar/arrow/arrow_15.0.1.tar.gz",
# repos = NULL, type = "source")
# 3.
# eval(call, envir = parent.frame())
# 2.
# eval(call, envir = parent.frame()) at shims.R#34
# 1.
# install.packages("/mnt/imported/data/renv/cellar/arrow/arrow_15.0.1.tar.gz",
# repos = NULL, type = "source")

I had a partial success by downgrading to renv 0.17.3 (this was arbitrary, I didn't try in-between version).

install.packages(file.path(renv::paths$root(), "cellar", "arrow", "arrow_15.0.1.tar.gz"), repos = NULL, type = "source")
# Installing package into '/mnt/code/renv/library/R-4.3/x86_64-pc-linux-gnu'
# (as 'lib' is unspecified)
...

The issue here was that for some reason the package did not end up moving to my cache, it just installed directly to my project folder.

Happy to try troubleshooting anything else.

kevinushey commented 7 months ago

Thanks for the bug report -- I wasn't able to reproduce locally, though. I tried the following:

# simulate version of arrow in the cellar
dir.create("renv/cellar", recursive = TRUE)
download.packages("arrow", destdir = "renv/cellar")

# get renv to dump information about installed packages
trace(renv:::renv_install_impl, quote({
  str(records)
}))

trace(renv:::r_cmd_install, quote({
  print(ls.str())
  stop("aborting install", call. = FALSE)
}))

# attempt installation
options(repos = NULL)
Sys.setenv(ARROW_S3="ON", ARROW_DEPENDENCY_SOURCE="BUNDLED", ARROW_OFFLINE_BUILD="true")
renv::install("arrow@15.0.1", repos = NULL, prompt = FALSE)

And here, I saw:

> renv::install("arrow@15.0.1", repos = NULL, prompt = FALSE)
- Package arrow [15.0.1] will be installed from the cellar.
The following package(s) will be installed:
- arrow [15.0.1]
These packages will be installed into "~/Library/R/arm64/4.3/library".

Tracing renv_install_impl(records) on entry 
List of 1
 $ arrow:List of 4
  ..$ Package: chr "arrow"
  ..$ Version: chr "15.0.1"
  ..$ Source : chr "Repository"
  ..$ Path   : Named chr "/Users/kevin/r/pkg/renv/renv/cellar/arrow_15.0.1.tar.gz"
  .. ..- attr(*, "names")= chr "source"
# Installing packages --------------------------------------------------------
- Installing arrow ...                          Tracing r_cmd_install(package, path) on entry 
package :  chr "arrow"
path :  chr "/Users/kevin/r/pkg/renv/renv/cellar/arrow_15.0.1.tar.gz"
FAILED

So renv should be using the Cellar here. Can you share the output of renv::diagnostics(), just in case you have some local configuration that might be relevant here?

hutch3232 commented 7 months ago

Thanks for helping me figure this out!

renv::diagnostics()

Diagnostics Report [renv 1.0.5]
===============================

# Session Info ---------------------------------------------------------------
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.3.1 tools_4.3.1    renv_1.0.5   

# Project --------------------------------------------------------------------
Project path: "/mnt/code"

# Status ---------------------------------------------------------------------
No issues found -- the project is in a consistent state.

# Packages -------------------------------------------------------------------
           Library Source Lockfile Source Path Dependency
KernSmooth 2.23-21   CRAN     <NA>   <NA>  [2]       <NA>
MASS        7.3-60   CRAN     <NA>   <NA>  [2]       <NA>
Matrix     1.5-4.1   CRAN     <NA>   <NA>  [2]       <NA>
boot        1.3-28   CRAN     <NA>   <NA>  [2]       <NA>
class       7.3-22   CRAN     <NA>   <NA>  [2]       <NA>
cluster      2.1.4   CRAN     <NA>   <NA>  [2]       <NA>
codetools   0.2-19   CRAN     <NA>   <NA>  [2]       <NA>
foreign     0.8-82   CRAN     <NA>   <NA>  [2]       <NA>
grDevices     <NA>   <NA>     <NA>   <NA>  [2]   indirect
graphics      <NA>   <NA>     <NA>   <NA>  [2]   indirect
import       1.3.2   CRAN    1.3.2   CRAN  [1]     direct
jsonlite     1.8.8   CRAN    1.8.8   CRAN  [1]     direct
lattice     0.21-8   CRAN     <NA>   <NA>  [2]       <NA>
methods       <NA>   <NA>     <NA>   <NA>  [2]   indirect
mgcv        1.8-42   CRAN     <NA>   <NA>  [2]       <NA>
nlme       3.1-162   CRAN     <NA>   <NA>  [2]       <NA>
nnet        7.3-19   CRAN     <NA>   <NA>  [2]       <NA>
renv         1.0.5   CRAN    1.0.5   CRAN  [1]     direct
rpart       4.1.19   CRAN     <NA>   <NA>  [2]       <NA>
rstudioapi  0.15.0   CRAN   0.15.0   CRAN  [1]     direct
spatial     7.3-11   CRAN     <NA>   <NA>  [2]       <NA>
stats         <NA>   <NA>     <NA>   <NA>  [2]   indirect
survival     3.5-5   CRAN     <NA>   <NA>  [2]       <NA>
utils         <NA>   <NA>     <NA>   <NA>  [2]     direct

[1]: /mnt/code/renv/library/R-4.3/x86_64-pc-linux-gnu                     
[2]: /home/ubuntu/.cache/R/renv/sandbox/R-4.3/x86_64-pc-linux-gnu/9a444a72

# ABI ------------------------------------------------------------------------
- No ABI problems were detected in the set of installed packages.

# User Profile ---------------------------------------------------------------
                  Source   Package Require Version  Dev
1 /home/ubuntu/.Rprofile grDevices                 TRUE

# Settings -------------------------------------------------------------------
List of 13
 $ bioconductor.version     : chr(0)
 $ external.libraries       : chr(0)
 $ ignored.packages         : chr "x"
 $ package.dependency.fields: chr [1:3] "Imports" "Depends" "LinkingTo"
 $ ppm.enabled              : NULL
 $ ppm.ignored.urls         : NULL
 $ r.version                : chr(0)
 $ snapshot.type            : chr "implicit"
 $ use.cache                : logi TRUE
 $ vcs.ignore.cellar        : logi TRUE
 $ vcs.ignore.library       : logi TRUE
 $ vcs.ignore.local         : logi TRUE
 $ vcs.manage.ignores       : logi TRUE

# Options --------------------------------------------------------------------
List of 9
 $ defaultPackages                     : chr [1:6] "datasets" "utils" "grDevices" "graphics" ...
 $ download.file.method                : chr "libcurl"
 $ download.file.extra                 : NULL
 $ install.packages.compile.from.source: NULL
 $ pkgType                             : chr "source"
 $ repos                               : chr(0)
 $ renv.config.github.host             : chr "https://github.myco.com/api/v3"
 $ renv.consent                        : logi TRUE
 $ renv.verbose                        : logi TRUE

# Environment Variables ------------------------------------------------------
HOME                        = /home/ubuntu
LANG                        = en_US.UTF-8
MAKE                        = make
R_LIBS                      = <NA>
R_LIBS_SITE                 = /usr/local/lib/R/site-library
R_LIBS_USER                 = /mnt/code/renv/library/R-4.3/x86_64-pc-linux-gnu
RENV_DEFAULT_R_ENVIRON      = <NA>
RENV_DEFAULT_R_ENVIRON_USER = <NA>
RENV_DEFAULT_R_LIBS         = <NA>
RENV_DEFAULT_R_LIBS_SITE    = /usr/local/lib/R/site-library
RENV_DEFAULT_R_LIBS_USER    = /home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3
RENV_DEFAULT_R_PROFILE      = <NA>
RENV_DEFAULT_R_PROFILE_USER = <NA>
RENV_PATHS_ROOT             = /mnt/imported/data/renv
RENV_PROJECT                = /mnt/code

# PATH -----------------------------------------------------------------------
- /opt/conda/bin
- /opt/oracle/instantclient_19_16
- /home/ubuntu/.local/bin
- /opt/oracle/instantclient_12_1
- /home/ubuntu/.local/bin
- /opt/conda/bin
- /usr/local/sbin
- /usr/local/bin
- /usr/sbin
- /usr/bin
- /sbin
- /bin
- /usr/lib/rstudio-server/bin/quarto/bin
- /usr/lib/rstudio-server/bin/postback

# Cache ----------------------------------------------------------------------
There are a total of 1007 packages installed in the renv cache.
Cache path: "/mnt/imported/data/renv/cache/v5/R-4.3/x86_64-pc-linux-gnu"
kevinushey commented 7 months ago

Thanks -- nothing stands out as surprising in your diagnostics report.

Can you share what you see if you run this to install arrow?

# get renv to dump information about installed packages
trace(renv:::renv_install_impl, quote({
  str(records)
}))

trace(renv:::r_cmd_install, quote({
  print(ls.str())
  stop("aborting install", call. = FALSE)
}))

# attempt installation
options(repos = NULL)
Sys.setenv(ARROW_S3="ON", ARROW_DEPENDENCY_SOURCE="BUNDLED", ARROW_OFFLINE_BUILD="true")
renv::install("arrow@15.0.1", repos = NULL, prompt = FALSE)

Also, does this affect only arrow, or does it affect other packages in the cellar as well?

hutch3232 commented 7 months ago

I made some progress on figuring out what's going on, but first:

> # get renv to dump information about installed packages
> trace(renv:::renv_install_impl, quote({
+     str(records)
+ }))
Tracing function "renv_install_impl" in package "renv (not-exported)"
[1] "renv_install_impl"
>
> trace(renv:::r_cmd_install, quote({
+     print(ls.str())
+     stop("aborting install", call. = FALSE)
+ }))
Tracing function "r_cmd_install" in package "renv (not-exported)"
[1] "r_cmd_install"
>
> # attempt installation
> options(repos = NULL)
> Sys.setenv(ARROW_S3="ON", ARROW_DEPENDENCY_SOURCE="BUNDLED", ARROW_OFFLINE_BUILD="true")
> renv::install("arrow@15.0.1", repos = NULL, prompt = FALSE)
- Package arrow [15.0.1] will be installed from the cellar.
# Downloading packages -------------------------------------------------------
Warning: failed to find source for 'arrow 15.0.1' in package repositories
Error: failed to retrieve package 'arrow@15.0.1'

Stepping through I got to this point which produces an error which is caught by catch. I think this causes it to try another approach (ie downloading the file). It believes there's something wrong with my DESCRIPTION.

# renv:::renv_retrieve_impl
# renv:::renv_retrieve_cellar
Browse[3]> shortcuts[[2]]
function(record) {
  source <- renv_retrieve_cellar_find(record)
  record <- renv_retrieve_cellar_report(record)
  renv_retrieve_successful(record, source)
}
<bytecode: 0x5587e0849a10>
<environment: namespace:renv>
Browse[3]> shortcuts[[2]](record)
- Package arrow [15.0.1] will be installed from the cellar.
Error: archive '/mnt/imported/data/renv/cellar/arrow/arrow_15.0.1.tar.gz' does not appear to contain a DESCRIPTION file

I tried to step through renv:::renv_retrieve_successful and I found this:

files <- renv_archive_list(path)
files
  [1] "./arrow/"                                                                        "./arrow/cleanup"                                                               
   [3] "./arrow/configure"                                                               "./arrow/configure.win"                                                         
   [5] "./arrow/DESCRIPTION"                                                             "./arrow/inst/"                                                                 
   [7] "./arrow/inst/build_arrow_static.sh"                                              "./arrow/inst/demo_flight_server.py"                                             
   [9] "./arrow/inst/NOTICE.txt"                                                         "./arrow/inst/v0.7.1.parquet"

The DESCRIPTION is there, but the code is not expecting it to have leading slashes, per:

https://github.com/rstudio/renv/blob/e0ea00d83bac5b55cc56222c03f0d6b014b5f17f/R/description.R#L47-L61

kevinushey commented 7 months ago

Interesting -- I suspect the leading ./ in the paths here is indeed confusing renv; I haven't seen that with other package tarballs before. That should be straightforward to accommodate in renv at least, though?

kevinushey commented 7 months ago

https://github.com/rstudio/renv/commit/e62301ca60692f22e07471cdfc5ab21e89d9fbde should also hopefully help work around this issue -- could you let me know if it helps?

hutch3232 commented 7 months ago

Thank you!

That did help it get past the detection of DESCRIPTION, but downstream a bit, when it tries to do the actual decompression, it errors because it can't find it at that (adjusted) exact path (./ stripped).

renv:::renv_tar_decompress
renv:::renv_system_exec
# args
# [1] "xf"                                                                "'/mnt/imported/data/renv/cellar/arrow/arrow_15.0.1.tar.gz'"
# [3] "-C"                                                                "'/tmp/RtmpgtqyKh/renv-description-6693128764f'"                   
# [5] "'arrow/DESCRIPTION'"                                             
# Browse[6]> system2(command, args)
# /usr/bin/tar: arrow/DESCRIPTION: Not found in archive
# /usr/bin/tar: Exiting with failure status due to previous errors

Perhaps the regex pattern should be tweaked instead of the file paths themselves?

kevinushey commented 7 months ago

Thanks; I gave that a try in https://github.com/rstudio/renv/commit/84126c74d2e0d62212c06c50e3e8f2e3ef0ebf6e.

hutch3232 commented 7 months ago

That worked beautifully, thank you!

kevinushey commented 7 months ago

Excellent -- thanks for helping me get to the bottom of the issue!