galyleo
is a shell utility to help you launch
Jupyter notebooks on high-performance computing
(HPC) systems in a simple, secure way. It works with SDSC's Satellite
reverse proxy service
and a batch job scheduler like Slurm to
provide each Jupyter notebook server you start with its own one-time,
token-authenticated HTTPS connection between the compute resources of
the HPC system the notebook server is running on and your web browser.
This secure connection affords both privacy and integrity to the data
exchanged between the notebook server and your browser, helping protect
you and your work against network eavesdropping and data tampering.
galyleo
is currently deployed on the following HPC systems at SDSC:
To use galyleo
, you first need to prepend its install location to your
PATH
environment variable. This path is different for each HPC system
at SDSC.
On Expanse, use:
export PATH="/cm/shared/apps/sdsc/galyleo:${PATH}"
On TSCC, there is now a software module available for loading galyleo
into your environment.
[mkandes@login1 ~]$ module load galyleo/0.7.4
[mkandes@login1 ~]$ module list
Currently Loaded Modules:
1) shared 2) cpu/0.17.3 3) slurm/tscc/23.02.7 4) sdsc/1.0 5) DefaultModules 6) galyleo/0.7.4
[mkandes@login1 ~]$ which galyleo
/cm/shared/apps/spack/0.17.3/cpu/opt/spack/linux-rocky9-cascadelake/gcc-11.2.0/galyleo-0.7.4/galyleo
[mkandes@login1 ~]$ echo $PATH
/cm/shared/apps/spack/0.17.3/cpu/opt/spack/linux-rocky9-cascadelake/gcc-11.2.0/galyleo-0.7.4:/tscc/nfs/home/mkandes/.local/bin:/tscc/nfs/home/mkandes/bin:/cm/shared/apps/sdsc/1.0/bin:/cm/shared/apps/sdsc/1.0/sbin:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/spack/0.17.3/cpu/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
[mkandes@login1 ~]$
Once galyleo
is in your PATH
, you can then use its launch
command
to create a secure Jupyter notebook session. A number of command-line
options will allow you to configure:
jupyter
notebook server
and the other software packages you want to work with during the session.For example, the following launch
command will create a 30-minute
JupyterLab session on two CPU-cores with 4 GB of memory on one of
Expanse's shared
AMD compute nodes using the base anaconda3
distribution available in its default cpu
software module environment.
galyleo launch --account abc123 --partition shared --cpus 2 --memory 4 --time-limit 00:30:00 --env-modules cpu/0.17.3b,anaconda3/2021.05
When the launch
command completes successfully, you will be issued a
unique HTTPS URL generated for your secure Jupyter notebook session.
https://wages-astonish-recapture.expanse-user-content.sdsc.edu?token=1abe04ac1703ca623e4e907cc37678ae
Copy and paste this HTTPS URL into your web browser. Your Jupyter notebook session will begin once the requested compute resources are allocated to your job by the scheduler.
A list of the most commonly used command-line options for the launch
command are described below.
Scheduler options:
-A, --account
: charge the compute resources required by this job to
the specified account or allocation project id-p, --partition
: select the resource partition or queue the
job should be submitted to-c, --cpus
: number of cpus to request for the job-m, --memory
: amount of memory (in GB) required for the job-g, --gpus
: number of GPUs required for the job-t, --time-limit
: set a maximum runtime (in HH:MM:SS) for the job-C, --constraint
: apply a feature constraint to specify the type of
compute node required for the jobJupyter options:
-i, --interface
: select the user interface for the Jupyter notebook
session; the only options are lab or notebook or voila-d, --notebook-dir
: path to the working directory where the Jupyter
notebook session will start; default value is your $HOME
directorySoftware environment options:
-e, --env-modules
: comma-separated list of environment modules
that will be loaded to create the software environment for the Jupyter
notebook session-s, --sif
: path to a Singularity container image file that will be
run to create the software environment for the Jupyter notebook session-B, --bind
: comma-separated list of user-defined bind paths to be
mounted within a Singularity container--nv
: enable NVIDIA GPU support when running a Singularity container--conda-env
: name of a conda environment to activate to create the
software environment for the Jupyter notebook session--conda-yml
: path to an environment.yml
file--mamba
: use mamba instead of miniconda to create your conda
environment from an environment.yml
file.--cache
: cache your conda environment created from an
environment.yml
file using conda-pack; a cached environment will be
unpacked and reused if the environment.yml
file does not changeAfter you specify the compute resources required for your Jupyter
notebook session using the Scheduler options outlined above, the
next most important set of command-line options for the launch
command
are those that help you define the software environment. Listed in the
Software environment options section above, these command-line options
are discussed in detail in the next few subsections below.
Most HPC systems use a software module system like Lmod or Environment Modules to provide you with a convenient way to dynamically load pre-installed software applications, libraries, and other packages into your shell's environment.
If you need to module load
any software to create the environment for
your Jupyter notebook session, you can do so by including them as a
comma-separated list to the --env-modules
option in your launch
command. Each module included in the list will be loaded prior to
starting jupyter
. In some cases, the --env-modules
command-line
option may be the only one you need to define your software environment.
For example, if you have a standard Python-based data science workflow
that you want run on Expanse, then you might only need to load one of
the Anaconda distributions available in its
software module environment.
galyleo launch --account abc123 --partition shared --cpus 2 --memory 4 --time-limit 00:30:00 --env-modules cpu/0.17.3b,anaconda3/2021.05
By default, each Anaconda distribution comes with over 250 of the most
popular data science software packages pre-installed, including jupyter
.
Singularity containers bring operating system-level virtualization to scientific and high-performance computing, allowing you to package complete software environments --- including operating systems, software applications, libraries, and data --- in a simple, portable, and reproducible way, which can then be executed and run almost anywhere.
If you have a Singularity container that you would like to run your
Jupyter notebook session within, then you simply need to provide a path
to the container with the --sif
option in your launch
command. This
will start jupyter
within the container using the singularity exec
command. If necessary, you can also pass user-defined --bind
mounts to
the container and enable NVIDIA GPU support via the --nv
flag.
One of the most powerful features of Singularity is its ability to convert an existing Docker container to a Singularity container. So, even if you are not familiar with how to build your own Singularity container, you can always search public container registries like Docker Hub for an existing container that may help you get your work done.
For example, let's say you need an R
environment for your Jupyter notebook session. Why not try the latest
r-notebook container
from the Jupyter Docker Stacks
project? To get started, you first use the singularity pull
command to
download and convert the Docker container to a Singularity container.
singularity pull docker://jupyter/r-notebook:latest
Once all of the layers of the Docker container have been downloaded and
the container conversion process is complete, you can then launch
your
Jupyter notebook session with the newly built Singularity container.
galyleo launch --account abc123 --cpus 2 --memory 4 --time-limit 00:30:00 --sif r-notebook_latest.sif
On some systems like Expanse, you may need to load Singularity via the software module environment as well.
galyleo launch --account abc123 --cpus 2 --memory 4 --time-limit 00:30:00 --env-modules singularitypro --sif r-notebook_latest.sif --bind /expanse,/scratch
Here, the user-defined --bind
mount option also enables access to both
the /expanse
network filesystems (e.g., /expanse/lustre
) and the
local NVMe /scratch
disk(s) available on each compute node from within
the container. By default, only your $HOME
directory is accessible from
within the container.
Singularity also provides native support for running containerized applications on NVIDIA GPUs. If you have a GPU-accelerated application you would like to run during your Jupyter notebook session, please make sure your container includes a CUDA-enabled version of the application that can utilize NVIDIA GPUs.
NVIDIA distributes a number of GPU-optimized containers via their
container registry. This includes
containers for all of the most popular deep learning frameworks ---
PyTorch, TensorFlow,
and MXNet --- with jupyter
pre-installed.
Like the the containers available from DockerHub, you can pull
these
containers to the HPC system you are working on
singularity pull docker://nvcr.io/nvidia/pytorch:21.07-py3
and then launch
your Jupyter notebook session with galyleo
. For
example, you might want to run this PyTorch container on a single
NVIDIA V100 GPU available in Expanse's gpu-shared
partition.
galyleo launch --account abc123 --partition gpu-shared --cpus 10 --memory 93 --gpus 1 --time-limit 00:30:00 --env-modules singularitypro --sif pytorch_21.07-py3.sif --bind /expanse,/scratch --nv
Note, however, how you request GPU resources with galyleo
may be
different from one HPC system to another. For example, on Comet you must
use the --gres
command-line option on Comet to specify both the type
and number of GPUs required for your Jupyter notebook session. The
following launch
command would create a session within the NVIDIA
PyTorch container on a single P100 GPU available in Comet's gpu-shared
partition.
galyleo launch --account abc123 --partition gpu-shared --cpus 7 --gres gpu:p100:1 --time-limit 00:30:00 --sif pytorch_21.07-py3.sif --bind /oasis,/scratch --nv
In contrast, on TSCC you'll never explicitly request a specific number
of GPUs for your Jupyter notebook session. All GPUs on TSCC are currently
allocated implicitly in proportion to the number of CPU-cores requested
by a job and available on the type of GPU-accelerated compute node you
expect it to run on. And if you would like to request your notebook
session be scheduled on a certain type of GPU, then you must pass the
type of GPU required listed in the pbsnodes
properties via the
--constraint
command-line option. For example, the following launch
command will schedule your session on one of the NVIDIA GeForce RTX ]
2080Ti GPUs available in the gpu-hotel
queue on TSCC.
galyleo launch --account abc123 --partition gpu-hotel --cpus 2 --constraint gpu2080ti --time-limit 00:30:00 --sif pytorch_21.07-py3.sif --bind /oasis --nv
Whatever you do, whenever you're launching your Jupyter notebook session
with galyleo
from a Singularity container on compute resources with
NVIDIA GPUs, don't forget the include the --nv
flag.
Conda is an open-source software package and environment manager developed by Anaconda Inc.. Its ease of use, compatibility across multiple operating systems, and comprehensive support for both the Python and R software ecosystems has made it one of the most popular ways to build and maintain custom software environments in the data science and machine learning communities. And because of the constantly evolving software landscape in these spaces, which can involve quite complex software dependencies, conda is often the simplest way to get your custom Python or R software environment up and running on an HPC system.
galyleo
supports the use of conda environments to configure the
software environment for your Jupyter notebook session. If you've
already installed a conda distribution --- we recommend Miniconda ---
and configured a custom conda environment within it, then you should
only need to specify the name of the conda environment you want to
activate for your notebook session with the --conda-env
command-line
option.
For example, let's imagine you've already created a custom conda
environment from the following environment.yml
file.
name: notebooks-sharing
channels:
- conda-forge
- anaconda
dependencies:
- python=3.7
- jupyterlab=3
- pandas=1.2.4
- matplotlib=3.4.2
- seaborn=0.11.0
- scikit-learn=0.23.2
You should then be able to launch
a 30-minute JupyterLab session on a
four CPU-cores with 8 GB of memory on one of Expanse's shared
AMD
compute nodes by simply activating the notebooks-sharing
environment.
galyleo launch --account abc123 --partition shared --cpus 4 --memory 8 --time-limit 00:30:00 --conda-env notebooks-sharing
Note, however, the use of the --conda-env
command-line option here
assumes you've already configured your ~/.bashrc
file with the conda init
command. If you have not done so (or choose not to do so), then
you can also initialize any conda distribution in your launch
command
by providing the path to its conda.sh
initialization script in the
etc/profile.d
directory via the --conda-init
command-line option.
galyleo launch --account abc123 --partition shared --cpus 4 --memory 8 --time-limit 00:30:00 --conda-env notebooks-sharing --conda-init miniconda3/etc/profile.d/conda.sh
While creating your own custom software environment with conda may be convenient, it can also generate a high metadata load on the types of shared network filesystems you'll often find on an HPC system. At a minimum, if you install your conda distribution on a network filesystem, you can expect this to increase the installation time of software packages into your conda environment when compared to a local filesystem installation you may have done previously on your laptop. Under some circumstances, this metadata issue can lead to a serious degradation of the aggregate I/O performance across a filesystem, affecting the performance of all user jobs on the system.
If you have not yet installed your conda environment on a shared
filesystem (such as in your $HOME
directory), galyleo
now also
allows you to dynamically create the environment at runtime from an
environment.yml
file. To use this feature, you simply need to provide
the name of the environment.yml
file with the --conda-yml
command-line option. For example, if you wanted to start an Juoyter
notebook session with the notebooks-sharing
environment, you would use
the following command:
galyleo launch --account abc123 --partition shared --cpus 4 --memory 8 --time-limit 00:30:00 --conda-env notebooks-sharing --conda-yml environment.yml
You can further improve the installation performance and reuse of these
dynamically generated conda environments by using the new --mamba
and
--cache
command-line options, which enables the use of
Mamba to speed up software installs
and saves the completed conda environment using
conda-pack for future reuse,
respectively.
galyleo launch --account abc123 --partition shared --cpus 4 --memory 8 --time-limit 00:30:00 --conda-env notebooks-sharing --conda-yml environment.yml --mamba --cache
If you experience a problem launching your Jupyter notebook session with
galyleo
, you may be able to debug the issue yourself by reviewing the
batch job script generated by galyleo
or the standard output/error
file generated by the job itself. You can find these files stored in the
hidden ~/.galyleo
directory created in your HOME
directory.
galyleo
has been integrated with the Open OnDemand-based Expanse User
Portal to help simplify launching
Jupyter notebooks on Expanse. After logging into the portal, you can
access this web-based interface to galyleo
from the Interactive Apps
tab in the toolbar across the top of your browser, then select Jupyter.
SDSC builds and maintains a number of custom Singularity containers for use on its HPC systems. Pre-built copies of many of these containers are made available from a central storage location on each HPC system. Please check the following locations for the latest containers. If you do not find the container you're looking for, please feel free to contact us and make a request for a container to be made available.
On Expanse:
/cm/shared/apps/containers/singularity
A work in progress.
If you would like to contribute to the project, then please submit a pull request via GitHub. If you have a feature request or a problem to report, then please create a GitHub issue.
Marty Kandes, Ph.D.
Computational & Data Science Research Specialist
High-Performance Computing User Services Group
Data-Enabled Scientific Computing Division
San Diego Supercomputer Center
University of California, San Diego
0.7.6
Monday, May 6th, 2024