njacobson-nci commented 1 year ago

I'm trying to run this stack for a few different users and want to be able to switch the username of the notebook user when i stand up the image.

When I do this, the spawned terminals/notebooks under the switched user aren't correctly sourcing the jovyan bashrc and running bitsandbytes fails to find libcudart.so.

The command i'm using: (most basic version to eliminate any variables in my custom install and deployment) docker run --gpus all -it -p 8848:8888 --user root -e NB_USER="njacobson" -e CHOWN_HOME=yes -w "/home/njacobson" cschranz/gpu-jupyter:v1.5_cuda-11.6_ubuntu-20.04_python-only

I attach to the container as root and run mamba install cudatoolkit -y python -m pip install bitandbytes

python -m bitsandbytes this fails to init and can't find libcudadart.so

if I source the /home/jovyan/.bashrc python -m bitsandbytes works

Running a jupyter notebook via jupyterlab and importing bitsandbytes also fails. As does a terminal spawned via jupyterlab unless I source the jovyan .bashrc.

I've tried copying the jovyan .bashrc into my home and chowning it, this fixes new terminals but new notebooks still won't properly import bitsandbytes.

nvidia-smi nvcc and torch.cuda.is_available() work in notebooks.

Not sure if this belongs here or with docker-stacks, but figured I'd start here.

Thanks!

benz0li commented 1 year ago

@njacobson-nci The problem is that LD_LIBRARY_PATH is not preserved, which is essential for CUDA images.

And the default LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 must be set/extended to LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib:/usr/local/cuda/lib64^1 beforehand.

FYI @mathbunnyru

mathbunnyru commented 1 year ago

@benz0li as far as I understand, in docker-stacks images we do not rely on LD_LIBRARY_PATH. And --preserve-env doesn't preserve this environment variable. I'm not sure we should explicitly preserve it.

https://www.sudo.ws/docs/man/sudoers.man/

The dynamic linker on most operating systems will remove variables that can control dynamic linking from the environment of set-user-ID executables, including sudo.

benz0li commented 1 year ago

I'm not sure we should explicitly preserve it.

@mathbunnyru For the jupyter/docker-stacks you are not supposed to.

benz0li commented 1 year ago

But if someone builds the jupyter/docker-stacks on top of nvidia/cuda images, LD_LIBRARY_PATH must be preserved – i.e. start.sh modified accordingly.

mathbunnyru commented 1 year ago

Thanks @benz0li. It makes sense to me 👍

njacobson-nci commented 1 year ago

Setting LD_LIBRARY_PATH in the jupyter notebook does resolve the issue in notebooks.

It doesn't appear to be required in a terminal after sourcing the jovyan bashrc, and it's not set in that terminal session either.

Changing the start.sh to not check for an existing /home/{$NB_USER} does allow the script to copy over the jovyan directory correctly and then new terminals are properly set up, but it doesn't fix notebooks. I'll try preserving LD_LIBRARY_PATH and see how that does.

import os

os.environ["LD_LIBRARY_PATH"] = "/opt/conda/lib/:/usr/local/cuda/lib64/lib"

import bitsandbytes

njacobson-nci commented 1 year ago

@benz0li Applying this fix you provided does copy over the LD_LIBRARY_PATH to the environment of the jupyter notbooks, but libcudart.so is still not found.

'LD_LIBRARY_PATH': '/usr/local/nvidia/lib:/usr/local/nvidia/lib64',

This path doesn't exist in this image or any of the other ones i've used recently. It is the default ld_library_path on the 11.6.2-cudnn-runtime base image, but those folders don't exist on that image either.

benz0li commented 1 year ago

This path doesn't exist in this image or any of the other ones i've used recently. It is the default ld_library_path on the 11.6.2-cudnn-runtime base image, but those folders don't exist on that image either.

That is correct. See also https://github.com/jupyter/docker-stacks/issues/1792#issuecomment-1487927643.
ℹ️ These paths /usr/local/nvidia/lib:/usr/local/nvidia/lib64 are kept for legacy reasons.

@benz0li Applying this fix you provided does copy over the LD_LIBRARY_PATH to the environment of the jupyter notbooks, but libcudart.so is still not found.

'LD_LIBRARY_PATH': '/usr/local/nvidia/lib:/usr/local/nvidia/lib64',

You need to set/extend the path to LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib:/usr/local/cuda/lib64^1 beforehand.

njacobson-nci commented 1 year ago

Updating the start.sh to set the ld_library_path as you called out here still has issues, but that might be a bitsandbytes thing. 'LD_LIBRARY_PATH': '/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib:/usr/local/cuda/lib64',

Appending /opt/conda/lib/ does resolve notebooks being able to find the libcudart.so.

This is what i added to the start.sh to fix it now LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib:/usr/local/cuda/lib64:/opt/conda/lib/" \

It still seems like there is a deeper issue with switching users in this manner, and I wonder if there are further bugs that will be experienced when applying this fix.

In a notebook, if I run " ! ll " this alias is not found despite being defined in the jovyan/njacobson .bashrc, is that expected?

benz0li commented 1 year ago

@njacobson-nci I can't help you any further as my images only use Python – and don't have Conda / Mamba installed.

njacobson-nci commented 1 year ago

Understood, I appreciate the help very much!

benz0li commented 1 year ago

P.S.: You can always install Conda / Mamba on user level in my images.

iot-salzburg / gpu-jupyter

Struggling to switch users and maintain full cuda support #108

os.environ["LD_LIBRARY_PATH"] = "/opt/conda/lib/:/usr/local/cuda/lib64/lib"