jupyterhub / jupyter-rsession-proxy

Jupyter extensions for running an RStudio rsession proxy
BSD 3-Clause "New" or "Revised" License
118 stars 87 forks source link

Propogate environment variables into RStudio #145

Open cboettig opened 4 months ago

cboettig commented 4 months ago

Bug description

Annoyingly, RStudio (though not R itself) decides to ignore global system environmental variables and only recognizes those environmental variables declared in an Renviron file (i.e. either $R_HOME/etc/Renviron.site, for all users, or a .Renviron in the user's home directory). For instance, the client ID required by the awesome gh-scoped-creds python module would typically be passed in this way.

In the rocker project, we propagate most environmental variables into R_HOME before bringing up the rserver by using the s9 init system, https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/init_set_env.sh , which obviously isn't used in a jupyterhub + jupyter-rsession-proxy setup. Would it be possible to have the jupyter-rsession-proxy do something similiar?

welcome[bot] commented 4 months ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

ryanlovett commented 4 months ago

It makes sense to try to get RStudio to be aware of those environment variables. It is also logical for the rocker images to modify one of the Renviron files because the docker images are in full control of the environment. However I worry about this extension making changes to those files because it could be run outside of docker. For example it could be run on a shared HPC node. If we change a file, it'd need to be per-user and we'd have to unset the changes somehow afterwards. Using environment variables would normally be the right choice to influence app behavior and being forced to use config files is, yes, annoying.

I'll look into how RStudio prepares the environment a bit more. In this case, it might be best to alter the files for the gh-scoped-creds env vars in the Docker images. (which I know you need in https://github.com/berkeley-dsep-infra/datahub)

cboettig commented 4 months ago

Thanks, makes sense, I think I followed most of this.

If we change a file, it'd need to be per-user and we'd have to unset the changes somehow afterwards.

I was wondering if it would be possible for juypter-ression-proxy to echo-append the env vars into the user's home dir, ~/.Renviron ? This wouldn't conflict with the Renviron.site coming from the docker image, and would automatically be applied per-user for Rstudio. Not sure about the unsetting it part.

and yeah, it's quite annoying RStudio makes us do this.

benz0li commented 3 months ago

With an image based on the Jupyter Docker Stacks, this could be done with a Startup Hook.

E.g. a rstudio.sh script containing

echo "LANG=$LANG" >>"$(R RHOME)/etc/Renviron.site"
echo "TZ=$TZ" >>"$(R RHOME)/etc/Renviron.site"
...

This would also require a change of ownership and permission of $(R RHOME)/etc/*.site when building the image:

chown :"$NB_GID" "$(R RHOME)/etc" "$(R RHOME)/etc/"*.site
chmod g+w "$(R RHOME)/etc" "$(R RHOME)/etc/"*.site
benz0li commented 3 months ago

@ryanlovett Is there something like Startup Hooks for binder, too?

manics commented 3 months ago

There are postBuild and start files.

yuvipanda commented 2 months ago

I think RStudio does this primarily to protect itself from weird env vars in people's desktops, and that causes issues when running serverside.

jupyter-rsession-proxy is a good place to do this!

I think the primary thing to be determined is which file to modify. Ideally, this should be:

  1. Per-user, and not systemwide. This accounts for both permission issues, as well as the issues of running outside containerized environments.
  2. At least inside containerized environments, does not persist past restarts commonly. This avoids issues possible issues with staleness.

@cboettig putting anything under $HOME helps with (1) but not (2). Does Renviron allow us to include other files? That way, another solution would be to add a line under $HOME that includes a file from somewhere else (like /tmp or wherever).

yuvipanda commented 2 months ago

Another important use case for this is with AWS credentials (or GCP credentials). These are dynamically set at runtime on the pod, and need to be propagated for automatic detection of credentials when accessing APIs to work. So while we can set gh-scoped-creds at image build time, we can't do that for these.

cmd-ntrf commented 2 months ago

Side note: Compute Canada / Digital Research Alliance of Canada solution for this problem was to patch RStudio.

Patch is available here: https://github.com/ComputeCanada/easybuild-easyconfigs-installed-avx2/blob/main/2023/RStudio-Server/rstudio-1.2.1335.patch. It is quite simple, but I have never tried to have it merged upstream.

ryanlovett commented 2 months ago

@yuvipanda What we ended up doing was modifying Rprofile to support sourcing files in an Rprofile.d directory, and then using extraFiles to set the env vars there. I imagine one could add a script in Rprofile.d to source something from HOME. This is all in the user environment and outside of jupyter-rsession-proxy however.

yuvipanda commented 2 months ago

@ryanlovett oooh, that's interesting. Is there a per-user Rprofile? If there is, perhaps we can dynamically modify that in rsession-proxy?

  1. Modify Rprofile to load env vars from a specific location (if it exists)
  2. have rsession-proxy dump out env vars in this specific location

Alternatively, we could start with putting this kinda code in Rprofile for just the rocker/binder image, where it can simply read from /proc/1/environ. Thoughts on that, @cboettig?

ryanlovett commented 2 months ago

Is there a per-user Rprofile? If there is, perhaps we can dynamically modify that in rsession-proxy?

Yes, ~/.Rprofile (and also ~/.Renviron). The modification process would have to be idempotent. Would you also want to leave the vars behind? Or assume whatever file is included gets overwritten each session?

yuvipanda commented 2 months ago

@ryanlovett yes, we'd have the code to be idempotent - reasonably doable I'd think (conda init does something like that for example).

The vars should be put on something like /tmp, so that gets cleaned up as appropriate. So we assume it gets overwritten each session.

cboettig commented 2 months ago

I think having something like:

tmp <- tempfile()
writeLines(readBin("/proc/1/environ", "character", n = 500), tmp)
readRenviron(tmp)

in ${R_HOME}/etc/Rprofile.site would populate the env var list when R session starts up for any user? Or if this is handled by jupyter-rsession-proxy, presumably it would append this to user-specific $HOME/.Rprofile ?

benz0li commented 2 months ago

Why not use a Startup Hook? e.g.

exclude_vars="HOME LD_LIBRARY_PATH OLDPWD PATH PWD RSTUDIO_VERSION SHLVL"
for var in $(compgen -e); do
  [[ ! $exclude_vars =~ $var ]] && echo "$var=${!var}" \
    >> "$(R RHOME)/etc/Renviron.site"
done
yuvipanda commented 2 months ago

@benz0li that is specific to jupyter/docker-stacks. I'd like a more general solution here.

my ideal solution is to get https://github.com/jupyterhub/jupyter-rsession-proxy/issues/145#issuecomment-2079798680 upstreamed so rstudio will also act like most other applications :D But I don't think that's going to happen.

eeholmes commented 1 month ago

@cboettig I think this Codespace devcontainer PWD behavior is related to this.

The following devcontainer.json spins up a devcontainer with JupyterLab and RStudio.

{
  "name": "test",
  "workspaceFolder": "/home/jovyan",
  "image": "ghcr.io/nmfs-opensci/container-images/py-rocket-base:latest",
  "forwardPorts": [ 8889 ],
  "portsAttributes": { "8889": { "label": "Jupyter Lab",  }  },
  "postCreateCommand": "jupyter lab --ip=0.0.0.0 --port=8889 --allow-root --no-browser --NotebookApp.token='' --NotebookApp.password=''"
}

In the codespace from a terminal: ${PWD} is \home\jovyan

In JupyterLab from a terminal: ${PWD} is \home\jovyan

In RStudio from the terminal tab ${PWD} is \workspaces\

In RStudio in the file panel what is shown is \home\jovyan which is ${HOME}

If we change the workspaceFolder to something different than HOME

  "workspaceFolder": "/home/jovyan/codespace",

In the codespace from a terminal: ${PWD} is /home/jovyan/codespace

In JupyterLab from a terminal: ${PWD} is /home/jovyan/codespace

In RStudio from the terminal tab ${PWD} is \workspaces\

In RStudio in the file panel what is shown is \home\jovyan which is ${HOME}

eeholmes commented 1 month ago

Sadly,

echo "PWD=/home/jovyan" >> ~/.Renviron

seems to be ignored. Setting other envs works fine but PWD is ignored. I restarted R and restarted the terminal tab. Even setting PWD=~ only works for that terminal. As soon as I start a new one, it goes back to the \workspaces\<reponame> PWD.

Also

echo PWD=\home\jovyan >> ~/.bashrc

didn't help (unless I typed bash).

benz0li commented 1 month ago

@eeholmes Set --notebook-dir=/home/jovyan in the postCreateCommand.

Cross reference: https://github.com/b-data/data-science-devcontainers#usage

benz0li commented 1 month ago

ℹ️ Opening your codespace in JupyterLab according to the GitHub Docs sets the default path to /workspaces/<repository-name> that you can not escape.

eeholmes commented 1 month ago

@benz0li Sadly that has no effect on the terminal being opened by usr/bin/env bash -l in RStudio. The PWD is fine in JupyterLab. The issue is in the terminal opened by RStudio that is being opened via the launcher.

image

The devcontainer.json file is working close to what I want now. https://github.com/nmfs-opensci/container-images/blob/main/.devcontainer/test/devcontainer.json I think I can fix the PWD issue with

echo -e PWD=/home/jovyan\ncd $PWD >> ~/.bash_login

in the postCreate command.

benz0li commented 1 month ago

@eeholmes Or simply use my/b-data's [CUDA-enabled] Data Science Dev Containers, which do not have such issues.

ℹ️ Basic settings for RStudio: https://github.com/b-data/data-science-devcontainers/tree/8b25e592d7ca97cdcea7acae0c553545e48e5bd0/.devcontainer/r-base/conf/rstudio/etc

ℹ️ And there is https://github.com/b-data/data-science-devcontainers/blob/8b25e592d7ca97cdcea7acae0c553545e48e5bd0/.devcontainer/r-base/scripts/usr/local/bin/postStartCommand.sh#L10-L26