uabrc / uabrc.github.io

UAB Research Computing Documentation
https://docs.rc.uab.edu
21 stars 12 forks source link

feat: docs for creating a jupyter kernel running in singularity container #235

Open jprorama opened 2 years ago

jprorama commented 2 years ago

We should add some docs to help folks create jupyter kernels that run inside singularity containers. I ran into this need recently when trying to use pytorch geometric installed in a regular Anaconda3/2021.11 venv from inside my notebook.

The module appears to be compiled on a newer platform (ubunut1804) which has a newer glibc==2.27. This means when the modules try to run our our rhel7 compute nodes they throw a library error about glibc being too old. glibc is not easy to upgrade or provide alternate versions of so it's easier to satisfy the requirement from an environment inside a container.

NGC provides a pytorch container that is easy to use via Singularity. You can pull it down into the project directory where you'll access your notebook.

cd <project-dir>
singularity pull pytorch:22.04-py3.sif docker://nvcr.io/nvidia/pytorch:22.04-py3

You can run the container on a GPU node in the same directory like so:

singularity run --nv -B /cm -B /data/user/$USER pytorch\:22.04-py3.sif /bin/bash

Note: there is an issue trying to use nvidia-smi inside the container since it doesn't appear to inherit the LD_LIBRARY_PATH from the caller. This doesn't appear to affect operations below, but if i want to test nvidia-smi inside the container, set up my env as follows:

module load cuda11.4/toolkit
module load Singularity
export SINGULARITYENV_PATH=$PATH
export SINGULARITYENV_LD_LIBRARY_PATH=$LD_LIBRARY_PATH  # this doesn't seem to work
singularity run --nv -B /cm -B /data/user/$USER pytorch\:22.04-py3.sif /bin/bash
LD_LIBRARY_PATH=/cm/local/apps/cuda/libs/current/lib64/:$LD_LIBRARY_PATH
nvidia-smil

Back to the kernel config...

The next step is to install a custom kernel for Juypter that starts the python kernel in the container. This is done by combining instructions from clemson iti, community docs and the ipython docs. A kernel is really just a json config that specifies the command to run for the kernel. We need a custom one that starts the container.

Make sure you have Anacoda loaded and preferred env loaded. This is mostly to provide good defaults and use the env that your calling jupyter notebook needs. (Don't know that this is strictly necessary.)

module load Anaconda3/2021.11
conda activate <myenv>

It's best to first create a template and then install the kernel. That way you always have the template and can reinstall or edit as needed outside the jupyter config dirs.

ipython kernel install --prefix ~/tmp --name singk --display-name "Python (singk)"

Then copy your custom kernel config file into your now kernel template file

cat > ~/tmp/share/jupyter/kernels/singk/kernel.json  << EOF
{
 "argv": [
  "singularity",
  "exec",
  "--nv",
  "-B",
  "/cm",
  "-B",
  "/data/user/$USER",
  "-B",
  "/data/user/home/$USER",
  "-e",
  "pytorch:22.04-py3.sif",
  "/home/$USER/.conda/envs/$CONDA_DEFAULT_ENV/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Python (singk)",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}
EOF

Notes:

  1. The $USER and $CONDA_DEFAULT_ENV variables are converted to strings during the cat command. If you edit the file directly make sure you use your actual values. You can't use variable expansions in the json.
  2. The parameters in the json file are like an exec() call. There is no command parsing so each string is specified as a separate quoted argument. Eg. use "-B", "/cm" not "-B /cm".

At this point your kernel is ready to install into the correct location for your user kernels.

jupyter kernelspec install ~/tmp/share/jupyter/kernels/singk --user

Now you can start a jupyter notebook in OOD on a GPU node and start the custom kernel for a new notebook. Go to OOD and select Jupyter. Add cuda, Singularity and Anaconda to your startup environment.

module load cuda11.4/toolkit
module load Singularity
module load Anaconda3/2021.11

Select a GPU partition like pascalnodes. Then launch the notebook job.

When you are in jupyter, navigate to the directory where you keep your notebooks and created the Singularity spif. You can then select "Python (singk)" to launch a notebook with the Singularity container.

Note: On occasion, I've observed the container starting and then restarting after the first start . I don't know what causes this but it seems to heal itself. You can look in the OOD job's output.txt file to debug further.

wwarriner commented 1 year ago

This may be a helpful use case to add to our workflow_solutions/getting_containers.md page. I think @Premas would be the resident expert to decide here.