rocker-org / rocker-versioned

Run current & prior versions of R using docker
https://hub.docker.com/r/rocker/r-ver
GNU General Public License v2.0
297 stars 169 forks source link

R_LIBS_USER clobbered #175

Closed infotroph closed 4 years ago

infotroph commented 4 years ago

In r-ver and derived images, any value of R_LIBS_USER set in the calling environment seems to be overruled at R startup, while other environment variables are respected:

In r-ver:latest, digest c5fff5fb9fa7:

R_LIBS=/tmp Rscript -e 'Sys.getenv("R_LIBS")'
[1] "/tmp"
R_LIBS_USER=/tmp Rscript -e 'Sys.getenv("R_LIBS_USER")'
[1] "/usr/local/lib/R/site-library"

r-base:latest, digest 779eb4e93d14, has the behavior I expected:

R_LIBS=/tmp Rscript -e 'Sys.getenv("R_LIBS")'
[1] "/tmp"
R_LIBS_USER=/tmp Rscript -e 'Sys.getenv("R_LIBS_USER")'
[1] "/tmp"

This is probably related to #108, but opening as a new issue because (1) it applies to r-ver, so apparently is not specific to Rstudio, and (2) I think ignoring a user-specified variable is itself a bug, without even considering its effect on .libPaths.

I'm aware that I can work around this by setting R_LIBS_USER in .Renviron, but for CI work regular environment variables are a lot more convenient.

cboettig commented 4 years ago

Thanks for the report. I completely see where you're coming from, but I'm not sure how best to proceed.

The reason for this behavior occurs is that R_LIBS_USER is set in the system-level Renviron, ${R_HOME}/etc/Renviron.

https://github.com/rocker-org/rocker-versioned/blob/9ad0922a7cb6932dfacc5ecc387048a1e1cd6e59/r-ver/devel/Dockerfile#L115

As you know, environmental variables set in Renviron override those set in the shell environment. If we did not set it ${R_HOME}/etc/Renviron, you wouldn't observe this behavior.

Unfortunately, the plot thickens with RStudio, which for reasons I don't fully understand, decides to ignore the system environmental variables all together (perhaps because errors due to incorrectly set env vars are much harder for their support team to help users catch and avoid?) This makes it necessary to write R-related env vars into the system Renviron, and makes it difficult to use CI specified env vars from RStudio.

Arguably that should have no effect on r-ver, but it also seems desirable that a script have the same environment whether it is run from a container through RStudio, or run through the same container but using the CLI R command. Similarly I figured it made sense that the environment on the rstudio image would thus be the same as it's parent image.

I'm reluctant to complain too much about how RStudio chooses to handle system environmental variables, since they have been very kind in granting us permission to distribute RStudio server in rocker containers in the first place, and there's already RStudio Server pro... Still, perhaps there is a better solution that can give consistent behavior when running RStudio & running the CLI mode, or maybe a different compromise to this issue would be better: e.g.

A. setting this only in the rstudio image? B. Not setting this in Renviron at runtime during RStudio startup (e.g. via an S6 init step) ?

at any rate it deserves this discussion at very least. I'm all ears.

@kevinushey we'd welcome any suggestions you have on this too if you get a chance! I'm not entirely sure I've properly understood/characterized RStudio's treatment of system environmental variables -- it also seems to copy Renviron from the $R_HOME to /etc/R/ iirc...)

kevinushey commented 4 years ago

My initial thought is that this should almost certainly read:

  && echo "R_LIBS_USER=\${R_LIBS_USER-'/usr/local/lib/R/site-library'}" >> /usr/local/lib/R/etc/Renviron \

as you do not want to override R_LIBS_USER if it has already been set by the user. A similar convention is normally used in the system .Renviron as well.

Unfortunately, the plot thickens with RStudio, which for reasons I don't fully understand, decides to ignore the system environmental variables all together (perhaps because errors due to incorrectly set env vars are much harder for their support team to help users catch and avoid?) This makes it necessary to write R-related env vars into the system Renviron, and makes it difficult to use CI specified env vars from RStudio.

Can you elaborate on what you mean by "system environmental variables"? For what it's worth, in the open source version of RStudio, R sessions are just plain child processes of the rserver process and so inherit the environment used by rserver. (In RStudio Server Pro, child sessions are separate processes launched through a shell, and so environment variables from that parent shell are inherited)

In general, though, any environment variables set in .Renviron / .Renviron.site should be respected regardless.

cboettig commented 4 years ago

Thanks @kevinushey !

We can certainly fix the dockerfile to do:

  && echo "R_LIBS_USER=\${R_LIBS_USER-'/usr/local/lib/R/site-library'}" >> /usr/local/lib/R/etc/Renviron \

though this won't solve the issue here since that is being run at build-time, while @infotroph wants to be able to set the environmental variable after the image is built (and hence said Renviron is already created).

, R sessions are just plain child processes of the rserver process and so inherit the environment used by rserver.

Thanks! This sounds promising. I think we need to dig into how to identify/control the environment used by rserver. rserver is launched through an S6 init script at container runtime:

https://github.com/rocker-org/rocker-versioned/blob/1632f1edef7afd26040a1341fe1953bc6b7c8c54/rstudio/Dockerfile#L78-L82

infotroph commented 4 years ago

this won't solve the issue here since that is being run at build-time

The echo is being run at build time, but the variable expansion happens at run time, so I think this almost solves my issue. Only problem is that expansions earlier in the file count as setting the value, so the existing line 46 of /usr/local/lib/R/etc/Renviron needs to be removed. Monkey-patched demo from a live container, with apologies for the unreadable sed instead of installing a real editor:

R_LIBS_USER="/tmp" Rscript -e 'Sys.getenv("R_LIBS_USER")'
[1] "/usr/local/lib/R/site-library"
sed -ibak -e 's!R_LIBS_USER='"'"'/usr/local/lib/R/site-library'"'"'!R_LIBS_USER=${R_LIBS_USER-'"'"'/usr/local/lib/R/site-library'"'"'}!' /usr/local/lib/R/etc/Renviron 
R_LIBS_USER="/tmp" Rscript -e 'Sys.getenv("R_LIBS_USER")'
[1] "/tmp"
# "woohoo, great, perfect! And now by default we--"
Rscript -e 'Sys.getenv("R_LIBS_USER")'
[1] "~/R/x86_64-pc-linux-gnu-library/3.6"
# "--oh."

it also seems desirable that a script have the same environment whether it is run from a container through RStudio, or run through the same container but using the CLI R command...

Totally agree that it's desirable to have the same environment by default, and thank you for the legwork you've done to make it happen. But I'll add that it's not necessarily desirable that they stay identical after I've started tweaking knobs, e.g. one of my occasional use cases for switching from a frontend to a script is for debugging weird environment issues.

...I think we need to dig into how to identify/control the environment used by rserver...

To clarify: It sounds like you're talking about ways of making sure any altered environment variables can be passed on to rserver in the same way they are to the CLI? That sounds great but don't hold out for it on my behalf -- I realize Rstudio has valid reasons for only promising to respoect ~/.Renviron and that there's an, uh, "rich tradition" of frontend apps being selective/idiosyncratic about the environment variables they recognize.

cboettig commented 4 years ago

but the variable expansion happens at run time

That's a great point! I'd be open to a PR on that to get us started then!

it's not necessarily desirable that they stay identical after I've started tweaking knobs

Totally agree