Open cboettig opened 4 months ago
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
It makes sense to try to get RStudio to be aware of those environment variables. It is also logical for the rocker images to modify one of the Renviron files because the docker images are in full control of the environment. However I worry about this extension making changes to those files because it could be run outside of docker. For example it could be run on a shared HPC node. If we change a file, it'd need to be per-user and we'd have to unset the changes somehow afterwards. Using environment variables would normally be the right choice to influence app behavior and being forced to use config files is, yes, annoying.
I'll look into how RStudio prepares the environment a bit more. In this case, it might be best to alter the files for the gh-scoped-creds
env vars in the Docker images. (which I know you need in https://github.com/berkeley-dsep-infra/datahub)
Thanks, makes sense, I think I followed most of this.
If we change a file, it'd need to be per-user and we'd have to unset the changes somehow afterwards.
I was wondering if it would be possible for juypter-ression-proxy to echo-append the env vars into the user's home dir, ~/.Renviron
? This wouldn't conflict with the Renviron.site coming from the docker image, and would automatically be applied per-user for Rstudio. Not sure about the unsetting it part.
and yeah, it's quite annoying RStudio makes us do this.
With an image based on the Jupyter Docker Stacks, this could be done with a Startup Hook.
E.g. a rstudio.sh
script containing
echo "LANG=$LANG" >>"$(R RHOME)/etc/Renviron.site"
echo "TZ=$TZ" >>"$(R RHOME)/etc/Renviron.site"
...
This would also require a change of ownership and permission of $(R RHOME)/etc/*.site
when building the image:
chown :"$NB_GID" "$(R RHOME)/etc" "$(R RHOME)/etc/"*.site
chmod g+w "$(R RHOME)/etc" "$(R RHOME)/etc/"*.site
@ryanlovett Is there something like Startup Hooks for binder, too?
I think RStudio does this primarily to protect itself from weird env vars in people's desktops, and that causes issues when running serverside.
jupyter-rsession-proxy
is a good place to do this!
I think the primary thing to be determined is which file to modify. Ideally, this should be:
@cboettig putting anything under $HOME
helps with (1) but not (2). Does Renviron
allow us to include other files? That way, another solution would be to add a line under $HOME that includes a file from somewhere else (like /tmp
or wherever).
Another important use case for this is with AWS credentials (or GCP credentials). These are dynamically set at runtime on the pod, and need to be propagated for automatic detection of credentials when accessing APIs to work. So while we can set gh-scoped-creds at image build time, we can't do that for these.
Side note: Compute Canada / Digital Research Alliance of Canada solution for this problem was to patch RStudio.
Patch is available here: https://github.com/ComputeCanada/easybuild-easyconfigs-installed-avx2/blob/main/2023/RStudio-Server/rstudio-1.2.1335.patch. It is quite simple, but I have never tried to have it merged upstream.
@yuvipanda What we ended up doing was modifying Rprofile to support sourcing files in an Rprofile.d directory, and then using extraFiles
to set the env vars there. I imagine one could add a script in Rprofile.d to source something from HOME. This is all in the user environment and outside of jupyter-rsession-proxy however.
@ryanlovett oooh, that's interesting. Is there a per-user Rprofile? If there is, perhaps we can dynamically modify that in rsession-proxy?
Alternatively, we could start with putting this kinda code in Rprofile for just the rocker/binder image, where it can simply read from /proc/1/environ
. Thoughts on that, @cboettig?
Is there a per-user Rprofile? If there is, perhaps we can dynamically modify that in rsession-proxy?
Yes, ~/.Rprofile (and also ~/.Renviron). The modification process would have to be idempotent. Would you also want to leave the vars behind? Or assume whatever file is included gets overwritten each session?
@ryanlovett yes, we'd have the code to be idempotent - reasonably doable I'd think (conda init
does something like that for example).
The vars should be put on something like /tmp
, so that gets cleaned up as appropriate. So we assume it gets overwritten each session.
I think having something like:
tmp <- tempfile()
writeLines(readBin("/proc/1/environ", "character", n = 500), tmp)
readRenviron(tmp)
in ${R_HOME}/etc/Rprofile.site
would populate the env var list when R session starts up for any user? Or if this is handled by jupyter-rsession-proxy, presumably it would append this to user-specific $HOME/.Rprofile
?
Why not use a Startup Hook? e.g.
exclude_vars="HOME LD_LIBRARY_PATH OLDPWD PATH PWD RSTUDIO_VERSION SHLVL"
for var in $(compgen -e); do
[[ ! $exclude_vars =~ $var ]] && echo "$var=${!var}" \
>> "$(R RHOME)/etc/Renviron.site"
done
@benz0li that is specific to jupyter/docker-stacks. I'd like a more general solution here.
my ideal solution is to get https://github.com/jupyterhub/jupyter-rsession-proxy/issues/145#issuecomment-2079798680 upstreamed so rstudio will also act like most other applications :D But I don't think that's going to happen.
@cboettig I think this Codespace devcontainer PWD behavior is related to this.
The following devcontainer.json spins up a devcontainer with JupyterLab and RStudio.
{
"name": "test",
"workspaceFolder": "/home/jovyan",
"image": "ghcr.io/nmfs-opensci/container-images/py-rocket-base:latest",
"forwardPorts": [ 8889 ],
"portsAttributes": { "8889": { "label": "Jupyter Lab", } },
"postCreateCommand": "jupyter lab --ip=0.0.0.0 --port=8889 --allow-root --no-browser --NotebookApp.token='' --NotebookApp.password=''"
}
In the codespace from a terminal: ${PWD} is \home\jovyan
In JupyterLab from a terminal: ${PWD} is \home\jovyan
In RStudio from the terminal tab
${PWD} is \workspaces\
In RStudio in the file panel what is shown is \home\jovyan which is ${HOME}
If we change the workspaceFolder to something different than HOME
"workspaceFolder": "/home/jovyan/codespace",
In the codespace from a terminal: ${PWD} is /home/jovyan/codespace
In JupyterLab from a terminal: ${PWD} is /home/jovyan/codespace
In RStudio from the terminal tab
${PWD} is \workspaces\
In RStudio in the file panel what is shown is \home\jovyan which is ${HOME}
Sadly,
echo "PWD=/home/jovyan" >> ~/.Renviron
seems to be ignored. Setting other envs works fine but PWD is ignored. I restarted R and restarted the terminal tab. Even setting PWD=~
only works for that terminal. As soon as I start a new one, it goes back to the \workspaces\<reponame>
PWD.
Also
echo PWD=\home\jovyan >> ~/.bashrc
didn't help (unless I typed bash
).
@eeholmes Set --notebook-dir=/home/jovyan
in the postCreateCommand
.
Cross reference: https://github.com/b-data/data-science-devcontainers#usage
ℹ️ Opening your codespace in JupyterLab according to the GitHub Docs sets the default path to /workspaces/<repository-name>
that you can not escape.
@benz0li Sadly that has no effect on the terminal being opened by usr/bin/env bash -l
in RStudio. The PWD is fine in JupyterLab. The issue is in the terminal opened by RStudio that is being opened via the launcher.
The devcontainer.json file is working close to what I want now. https://github.com/nmfs-opensci/container-images/blob/main/.devcontainer/test/devcontainer.json I think I can fix the PWD issue with
echo -e PWD=/home/jovyan\ncd $PWD >> ~/.bash_login
in the postCreate command.
@eeholmes Or simply use my/b-data's [CUDA-enabled] Data Science Dev Containers, which do not have such issues.
ℹ️ Basic settings for RStudio: https://github.com/b-data/data-science-devcontainers/tree/8b25e592d7ca97cdcea7acae0c553545e48e5bd0/.devcontainer/r-base/conf/rstudio/etc
Bug description
Annoyingly, RStudio (though not R itself) decides to ignore global system environmental variables and only recognizes those environmental variables declared in an Renviron file (i.e. either $R_HOME/etc/Renviron.site, for all users, or a .Renviron in the user's home directory). For instance, the client ID required by the awesome
gh-scoped-creds
python module would typically be passed in this way.In the rocker project, we propagate most environmental variables into R_HOME before bringing up the rserver by using the s9 init system, https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/init_set_env.sh , which obviously isn't used in a jupyterhub + jupyter-rsession-proxy setup. Would it be possible to have the jupyter-rsession-proxy do something similiar?