rocker-org / rocker-versioned2

Run current & prior versions of R using docker. rocker/r-ver, rocker/rstudio, rocker/shiny, rocker/tidyverse, and so on.
https://rocker-project.org
GNU General Public License v2.0
413 stars 177 forks source link

environment PATH inconsistency among s6/rstudio, R, bash execution contexts #626

Open hute37 opened 1 year ago

hute37 commented 1 year ago

I found this problem when trying to enable pyenv+poetry python support on a keras project based on ml-verse image.

I wrote a couple of scripts,

based on standard rocker project versions


The main question is:

"where is/are the right place(s) to configure environment/PATH variables ?"

Following the examples, I put setting in

The Renviron file cannot execute bash code, so i manually expand settings given by evals

eval "$(pyenv init --path)" 
eval "$(pyenv virtualenv-init -)"

so far, so good ...

running /bin/bash interactively in container triggers /etc/bash.bashrc evals, while running R repl can find pyenv, poetry in path:

system("env | sort")
Sys.which('pyenv')
Sys.which('poetry')

In this setting, the right version is selected via pyenv (.python-version project file)

Sys.which('python')

In rstudio-server session instead, I cannot configure correctly.

There is a mix of conflicting setting ...

#!/usr/bin/with-contenv bash
## load /etc/environment vars first:
for line in $( cat /etc/environment ) ; do export $line > /dev/null; done
exec /usr/lib/rstudio-server/bin/rserver --server-daemonize 0

there is also a rsession.sh that could be used un starting internal R sessions

Then:

# ls -l /usr/local/bin
total 112
...
-rwxr-xr-x 1 root root   221 Mar 29 10:31 pip
-rwxr-xr-x 1 root root   221 Mar 29 10:31 pip3
-rwxr-xr-x 1 root root   221 Mar 29 10:31 pip3.10
lrwxrwxrwx 1 root root    16 Mar 15 11:27 python -> /usr/bin/python3
...
-rwxr-xr-x 1 root root   238 Mar 29 10:31 virtualenv
-rwxr-xr-x 1 root root   214 Mar 29 10:31 virtualenv-clone
-rwxr-xr-x 1 root root   208 Mar 29 10:31 wheel

In rstudio R console, i get:

> system("env|grep ^PATH")
PATH=/usr/local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/usr/lib/rstudio-server/bin/quarto/bin:/usr/lib/rstudio-server/bin/postback
> Sys.which('pyenv')
pyenv 
   "" 
> Sys.which('python')
                 python 
"/usr/local/bin/python" 

> 

in rstudio terminal, i get

root@4e40c7897142:~# echo $PATH
/usr/local/bin:/usr/lib/rstudio-server/resources/terminal/bash/.local/bin:/opt/poetry/bin:/opt/pyenv/plugins/pyenv-virtualenv/shims:/opt/pyenv/shims:/opt/pyenv/bin:/usr/lib/rstudio-server/resources/terminal/bash/.local/bin:/usr/local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/usr/lib/rstudio-server/bin/quarto/bin:/usr/lib/rstudio-server/bin/postback:/usr/bin
root@4e40c7897142:~# which pyenv
/opt/pyenv/bin/pyenv
root@4e40c7897142:~# 
root@4e40c7897142:~# which python
/usr/local/bin/python
root@4e40c7897142:~# 
root@4e40c7897142:~# pyenv versions
  system
* 3.10.6 (set by PYENV_VERSION environment variable)
root@4e40c7897142:~# 

Which is wrong because the paths /usr/local/bin:/usr/lib/rstudio-server/resources/terminal/bash/.local/bin were prepended to system path and a spurious /usr/local/bin/python override pyenv version, under /opt/pyenv/shims


Running /bin/bash in container

# echo $PATH
/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin
# which python
/usr/local/bin/python

# source /etc/profile
# eval "$(pyenv init -)"

# echo $PATH
/opt/pyenv/shims:/root/.local/bin:/opt/poetry/bin:/opt/pyenv/plugins/pyenv-virtualenv/shims:/root/.pyenv/shims:/opt/pyenv/bin:/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin
# which python
/opt/pyenv/shims/python

Running R in container

>
> system("env | grep ^PATH")
PATH=~/.local/bin:/opt/poetry/bin:~/.local/bin:/opt/pyenv/bin:/opt/pyenv/shims:/opt/pyenv/plugins/pyenv-virtualenv/shims:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin:/usr/local/texlive/bin/linux/
> Sys.which("pyenv")
                 pyenv
"/opt/pyenv/bin/pyenv"
> Sys.which("python")
                   python
"/opt/pyenv/shims/python"
>

Maybe what is missing here i a right initialization for s6 supervisor/rstudio-server configuration ...

hute37 commented 1 year ago

Spurious /usr/local/bin/python link to system python is in this script

To enable pip package installations that require recompilation (Python.h), python3-dev is also required

eitsupi commented 1 year ago

@cboettig Any thoughts on this?

cboettig commented 1 year ago

@eitsupi thanks for the ping and @hute37 thanks for the issue, we do probably need to at least document some of these things better. Additionally, some of these might need fixing or may at least unnecessary. You raise a lot of different issues here, so I'll try and hit on each but we might want to break this out into different threads.

"where is/are the right place(s) to configure environment/PATH variables ?"

Yes, great question, but unfortunately the answer depends on a handful of things, as @hute37 observes above. Most of these choices are not in our control.

Most important of these is if you are configuring environmental variables to be accessed from the RStudio interface or another mechanism (e.g. direct bash or R console from container, not via RStudio, or via the S6 init system). RStudio's R console only gets its environmental variables from R's various .Renviron / Reviron.site and default RStudio settings, not the system environmental variables. We don't control this of course but should probably document it more clearly, along with advice about how to pass environmental variables. Many of the rocker scripts write to $R_HOME/etc/Renviron for this reason.

Because Docker users frequently pass environmental variables via docker --env or --env-file, we attempt to pass most of these (with some exceptions like PASSWORD) up transparently to the RStudio R console, as you noted:

R_HOME/etc/Renviron.site is not read by s6 supervisor but is written (!) by /etc/cont-init.d/01_set_env

I'm not sure why it's surprising that s6 supervisor isn't reading Renviron.site -- Renviron.site is meant to be read by R processes.

/etc/bash.bashrc is ignored by the service. I tried to move everything under /etc/profile.d scripts but neither bash.bashrc nor /etc/profile get sourced.

Assuming 'the service' here refers to RStudio R console? If so, yes, RStudio R console uses Renviron files, not bash profiles, for env vars.

/etc/services.d/rstudio/run rewrites environment from '/etc/environment'

yes, the call to rserver respects /etc/environment. This is probably superfluous, as /etc/environment is not the recommended way to set environment variables in docker, but does provide a mechanism independent of what gets bubbled up into .Renviron files to configure the rserver call.

The [rsession.sh] (https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/rsession.sh) I think isn't actively used for anything right now? (I think this was there for running rstudio without root?)

under /usr/local/bin there is a link to system python3:

Right, under vanilla ubuntu system python is never bound to python, a hold-over from the python-2 days that now feels somewhat quaint. This symlink just puts system python on the the system path without having to include the 3, should be harmless, except that as you note it looks like somewhere /usr/local/bin is being pre-pended to PATH ahead of your poetry links, which doesn't look right. I'm not exactly sure where that happens.

If I understand correctly, your main concern is getting pyenv to use non-system python, yes? That's a rather more focused issue than the more general issue of where/how to set environmental variables (which at least does need more documentation). It's been a while since I've tested the install_pyenv.sh script in https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/install_pyenv.sh#L35, (most users seem to prefer conda-based mechanism for installing alternate versions of python, which has more built-in support in reticulate already), but I don't see anything in there that should be putting /usr/bin/local before the pyenv paths....

hute37 commented 1 year ago

about "system" python

I think that the question related to /usr/local/bin/python symlink to /usr/bin/python3 can be addressed simply moving the link to /usr/bin

The reason may be historical ...

Ubuntu was one of the latest Linux to complete transition to python3, having many critical components in python2 (apt-get, ...), so the choice was to be explicit on python version used by the applications

Arch was one of the first to upgrade to python3.

Because system components should be placed in /usr/bin, leaving /usr/local/bin available for "local" builds, I think that moving the link is the "right" thing.

Note:


UBUNTU (22.04)

# lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:    22.04
Codename:   jammy

# ls -l /usr/bin/python*

lrwxrwxrwx 1 root root      10 Aug 18  2022 /usr/bin/python3 -> python3.10
-rwxr-xr-x 1 root root 5912936 Mar 10 11:55 /usr/bin/python3.10
lrwxrwxrwx 1 root root      34 Mar 10 11:55 /usr/bin/python3.10-config -> x86_64-linux-gnu-python3.10-config
lrwxrwxrwx 1 root root      17 Aug 18  2022 /usr/bin/python3-config -> python3.10-config
-rwxr-xr-x 1 root root     960 Jan 25 09:29 /usr/bin/python3-futurize
-rwxr-xr-x 1 root root     964 Jan 25 09:29 /usr/bin/python3-pasteurize

ARCH (Manjaro)

# lsb_release -a

LSB Version:    n/a
Distributor ID: ManjaroLinux
Description:    Manjaro Linux
Release:        22.0.0
Codename:       Sikaris

# ls -l /usr/bin/python*

lrwxrwxrwx 1 root root     7 Nov  1 15:18 /usr/bin/python -> python3
lrwxrwxrwx 1 root root    10 Nov  1 15:18 /usr/bin/python3 -> python3.10
-rwxr-xr-x 1 root root 14272 Nov  1 15:18 /usr/bin/python3.10
-rwxr-xr-x 1 root root  3306 Nov  1 15:18 /usr/bin/python3.10-config
lrwxrwxrwx 1 root root    17 Nov  1 15:18 /usr/bin/python3-config -> python3.10-config
-rwxr-xr-x 1 root root  2554 Feb 15  2022 /usr/bin/python-argcomplete-check-easy-install-script
-rwxr-xr-x 1 root root   383 Feb 15  2022 /usr/bin/python-argcomplete-tcsh
lrwxrwxrwx 1 root root    14 Nov  1 15:18 /usr/bin/python-config -> python3-config
lrwxrwxrwx 1 root root    52 Apr 17  2022 /usr/bin/pythontex -> /usr/share/texmf-dist/scripts/pythontex/pythontex.py
eddelbuettel commented 1 year ago

(As a somewhat total aside I also relied on just python recently and discovered that on Ubuntu installing the wonderfully-prosaically-named package python-is-python3 now helps.)

hute37 commented 1 year ago

TL;DR pyenv/poetry in rstudio PATH

Now It seems to work ...

Starting from ml-verse base image I noticed that nvidia/cuda PATH directories were correctly set. I didn't found any reference to this paths in /etc and Renviron* files. Checking the original repo: nvidia/container-images/cuda i found the setting in

I've done the same, and it'is working ! ;)

Mandatory reference:

hute37 commented 1 year ago

environment references ...

Trying to understand the reason why my settings get lost I found some fact that I didn't know. As a brief recap:

  1. RStudio does not execute R as a subprocess. It uses a "strange" animal: the rsession binary subprocess that embeds R as a shared library, but that also acts as an RPC server to RStudio interface:

  2. RStudio comes in two flavors: "Open-Source" and "Pro". One of the most important add-on in "Pro" is full support of PAM profiles and user environment initialization (see: PAM Authentication). In particular /etc/environment is read by PAM login modules, before /etc/profile (see: What is the difference between /etc/environment and /etc/profile?)

    NOTE:

    • having put PATH settings in too many places, I needed a way to remove duplicate entries. I found this useful (taken from Removing Duplicate PATH Entries)
# in /etc/profile.d

eval "$(pyenv init --path)"
eval "$(pyenv virtualenv-init -)"

PATH=$(P=$(echo -n $PATH | awk -v RS=: -v ORS=: '!($0 in a) {a[$0]; print $0}'); echo -n ${P:0:-1})

export PATH
cboettig commented 1 year ago

Thanks @hute37 , this is great, glad things are working.

We really ought to add a page documenting the use of python (which could also get into this wild west of python environment managers, (conda, pyenv, pipenv, poetry, etc) and the ml images in rocker on https://rocker-project.org/, and maybe a separate one on environmental variables?