jupyterhub / jupyter-rsession-proxy

Jupyter extensions for running an RStudio rsession proxy
BSD 3-Clause "New" or "Revised" License
118 stars 87 forks source link

Rstudio does not respect environment variables initiated during login shell. #135

Open d-walkama opened 1 year ago

d-walkama commented 1 year ago

We spawn single user jupyterhub instances by setting the c.Spawner.cmd to a script that initiates a login shell with "!#/bin/bash -l". This allows us to dereference shell variables that were created in .Renviron and we see this works by checking them in R console within JupyterHub. However, this does not work in Rstudio using jupyter-rsession-proxy. Is there any way to solve this issue?

welcome[bot] commented 1 year ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

ryanlovett commented 1 year ago

I tried setting variables in ~/.Renviron in different spawning environments (kubespawner, batchspawner) on two separate hubs and both were available within the RStudio console. As far as I know, jupyter-rsession-proxy cannot alter this behavior. However can you elaborate on

dereference shell variables that were created in .Renviron

That sounds like you're doing something different than vanilla Key1=value1 assignments.

d-walkama commented 1 year ago

Yes we wish to do something a bit different than hardcoding the variables. i.e setting:

PATH=${PATH}

in .Renviron, so that we can build the variables during the login (scripts in /etc/profile.d) and pull them into R. We've been able to do this just fine in the R console within JupyterHub, but not with Rstudio.

Our current scheme spawns a "jupyterhub-singleuser" via a script that uses a shebang with the option "-l" which starts a login shell. This sources the user's profile before opening Juptyerhub and allows for the generation of variables, functions, aliases, etc. We can then dereference those variables in .Renviron so that they do not have to be hardcoded. This scheme works fine when spawning an R kernel in Jupyterhub, but not when spawning an Rstudio instance.

ryanlovett commented 1 year ago

Just trying to follow this example, you define things like PATH in /etc/profile.d/, but if you do not set PATH in .Renviron, RStudio won't inherit it from the shell environment?

Do you know if this also works with RStudio outside of jupyterhub + jupyter-rsession-proxy ?

d-walkama commented 1 year ago

Correct, I believe R only creates variables that are set in .Renviron. I.e. if you have a shell variable TEMP="test" and you do not set TEMP=${TEMP} in .Renviron, then Sys.getenv("TEMP") will return a null value in an R instance.

d-walkama commented 1 year ago

I see this spot in your code which I'm assuming is where the environment is stripped down to just the specific things listed below?:

def setup_rsession():
    def _get_env(port):
        # Detect various environment variables rsession requires to run
        # Via rstudio's src/cpp/core/r_util/REnvironmentPosix.cpp
        cmd = ['R', '--slave', '--vanilla', '-e',
                'cat(paste(R.home("home"),R.home("share"),R.home("include"),R.home("doc"),getRversion(),sep=":"))']

        r_output = subprocess.check_output(cmd)
        R_HOME, R_SHARE_DIR, R_INCLUDE_DIR, R_DOC_DIR, version = \
            r_output.decode().split(':')

        return {
            'R_DOC_DIR': R_DOC_DIR,
            'R_HOME': R_HOME,
            'R_INCLUDE_DIR': R_INCLUDE_DIR,
            'R_SHARE_DIR': R_SHARE_DIR,
            'RSTUDIO_DEFAULT_R_VERSION_HOME': R_HOME,
            'RSTUDIO_DEFAULT_R_VERSION': version,
        }

    def _get_cmd(port):
        return [
            get_rstudio_executable('rsession'),
            '--standalone=1',
            '--program-mode=server',
            '--log-stderr=1',
            '--session-timeout-minutes=0',
            '--user-identity=' + getpass.getuser(),
            '--www-port=' + str(port)
        ]

    return {
        'command': _get_cmd,
        'environment': _get_env,
        'launcher_entry': {
            'title': 'RStudio',
            'icon_path': get_icon_path()
        }
    }
ryanlovett commented 1 year ago

That code does set some environment variables, but it should not unset everything else. jupyter-server-proxy integrates those variables into the existing environment.

I have a feeling this has more to due with the spawner or RStudio and not jupyter-rsession-proxy. However I'm not sure why RStudio is not otherwise picking up your /etc/profile.d/ configuration. According to RStudio's docs, R is started under a bash login shell and should read /etc/profile.

d-walkama commented 1 year ago

I think there are major differences between how the open-source Rstudio server and Rstudio server pro configure themselves. You can see the steps you mention boxed out with "pro" here:

https://docs.rstudio.com/ide/server-pro/r_sessions/session_startup_scripts.html

scj643 commented 1 year ago

You could shim the rsession executable with a shell script and use that to inject environment variables. Using --rsession-path= in the rstudio command.