rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

.onLoad fails in loadNamespace on read-only filesystem (AWS lambda) #611

Closed FMKerckhof closed 1 year ago

FMKerckhof commented 2 years ago

I am trying to read a pin from an Rsconnect board in a containerized AWS lambda function (runtime: R 4.1.3, pins 1.0.1). Although I am specifying the cache directory to be on the ephemeral storage of lambda (/tmp) as follows: board <- pins::board_rsconnect(server = Sys.getenv("CONNECT_SERVER"), cache = "/tmp") (the CONNECT_SERVER and CONNECT_API_KEY have been added as environmental variables to the lambda function configuration), still I get an error that on the loading of the namespace an attempt at creating a directory in /home is being made that is not writable. Is there a way to circumvent this behavior? I am loading at least 20 other packages in the runtime without any problems - it is specific to the behavior of pins:

Error: package or namespace load failed for ‘pins’:
.onLoad failed in loadNamespace() for 'pins', details:
call: NULL
error: [EROFS] Failed to make directory '/home/sbx_user1051': read-only file system

I checked r-pkgs on side effects on load and in zzz.R I noticed that the function board_register_local gets called .onLoad, without any options https://github.com/rstudio/pins/blob/2a1a0d89af8e9e3b696c656d5bb166502cec1ee0/R/zzz.R#L12 . Is there a way to pass /tmp as a cache directory here? Could it be read from an environmental variable?

FMKerckhof commented 2 years ago

Based upon zzz.R it would appear I could "trick" the lambda runtime to pretend to be R CMD CHECK by setting a non-empty value to the _R_CHECK_PACKAGE_NAME_ cf. https://github.com/rstudio/pins/blob/2a1a0d89af8e9e3b696c656d5bb166502cec1ee0/R/utils.R#L115 which will indeed call tempdir() which on a linux system will use /tmp . However, lambda does not allow for environmental variables that start with "_" image

FMKerckhof commented 2 years ago

I have been able to resolve this by setting the environmental variables in the lambda runtime before loading the pins library as follows:

Sys.setenv(R_USER_CACHE_DIR = tempfile())
Sys.setenv(R_USER_DATA_DIR = tempfile())

library(pins)

This was inspired by the R CMD CHECK fix of zzz.R .

I am closing the issue (since it is resolved), but it may be useful to document these environmental variables more clearly since they define the .onLoad behavior of the pins namespace and can lead to errors on read-only filesystems.

FMKerckhof commented 2 years ago

I am re-opening this issue since the fix above appears to not work any longer. I used to get warnings for not being able to create a directory from normalizePath that still allowed the code to be called:

normalizePath("~") :
path[1]="/home/sbx_user1051": No such file or directory
Warning message:
In normalizePath("~") :
path[1]="/home/sbx_user1051": No such file or directory

Now, with more recent versions of R and some dependent packages (and maybe updates on AWS side) I get an error that stops code execution:

.onLoad failed in loadNamespace() for 'pins', details:
--
call: NULL
error: [EROFS] Failed to make directory '/home/sbx_user1051': read-only file system

I know it is a pretty niche case - but is there a way to load the pins package on a read-only file system? Afterwards the cache dirs can take over (above) but how can I assure loading the package will not become an issue?

juliasilge commented 2 years ago

Thanks for reopening @FMKerckhof! I don't believe there is currently a workaround if the environment variables don't work anymore. 😞 We will work on a fix for this after our big conference, so you can look out for that in August. We'd definitely appreciate your input at that time.

mdneuzerling commented 2 years ago

I've had success by setting the PINS_USE_CACHE environment variable to "true" (any other value would work). The relevant function is board_cache_path. I still get some warnings about normalizePath("~") (a directory that doesn't exist) but no errors.

juliasilge commented 1 year ago

Having worked a little more with AWS Lambda, I do think that the PINS_USE_CACHE env variable is working pretty well, and seems like a reasonable use for an env var. However, it is weird/confusing that PINS_USE_CACHE = "true" works when what we are doing is really turning off the cache in favor of using the temp directory. Here are some questions:

FMKerckhof commented 1 year ago

Thanks for including me @juliasilge

In the end, I went with setting the following system environmental variables in my runtime before loading the pins package (or packages that call pins) did the trick in an image based on lambda/provided:al2 :

Sys.setenv(R_USER_CACHE_DIR = tempdir())
Sys.setenv(R_USER_DATA_DIR = tempdir())
Sys.setenv(HOME = tempdir())

While this admittedly not ideal, from my perspective it is a bit more sensible (?) than the PINS_USE_CACHE env var because it provide pins along with other packages that may rely on those env vars with a writable directory on the lambda's ephemeral storage .

That being said, I haven't verified if this would/could work on actual read-only filesystems or on other linux flavors. W.r.t. documentation - maybe the caching section of getting started is a bit to much targeted towards a general audience for it? It seems like this is a quite niche problem, maybe better suited under the article section (e.g. "using pins on read-only filesystems/with lambda")?

juliasilge commented 1 year ago

Thanks so much @FMKerckhof! 🙌 I decided to go ahead and add a new pins-specific env variable as well, and to do some documentation on a function page. Take a look at #748 if you are interested in giving feedback.

juliasilge commented 1 year ago

You can check out new documentation (including the new env var for the pins cache) here.

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.