Open Premas opened 1 year ago
Some of the info below may be obsolete, as it is based on the old cudaX.Y
modules. We will need to update this for the new CUDA/X.Y.Z
modules.
LD_LIBRARY_PATH looks to be of the form:
/share/apps/rc/software/CUDA/11.6.0/nvvm/lib64:/share/apps/rc/software/CUDA/11.6.0/extras/CUPTI/lib64:/share/apps/rc/software/CUDA/11.6.0/lib
Here is my (hopefully?) complete understanding. When we write these docs, let's follow through the entire process with a specific example. We'll also need to include the detail of the --nv
flag (and its successor in more recent singularity versions >3.5).
Pre-built containerized CUDA applications need the following:
libcuda.so
) [always required]libcudart.so
) [may be included]The driver will always be required for a containerized application, because the container can't assume anything about the hardware on the system. Some containers may also need the locally installed version of CUDA toolkit or cuDNN library.
The Cheaha path for Nvidia drivers is /cm/local/apps/cuda/libs/current/lib64
. Each of the above libraries has some path on Cheaha which must be made available to the container in two ways. The paths may be discovered using tr ':' '\n' <<< "$LD_LIBRARY_PATH"
. Generally, look for anything with cuda
in the path. Put all of these paths into a shell variable separated by :
character, as in the following specific example.
CUDA_LIB_PATHS=/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/cuda11.4/toolkit/11.4.2/targets/x86_64-linux/lib`
LD_LIBRARY_PATH
Prepend the variable to the container's LD_LIBRARY_PATH
variable like the following. Note that any variable starting with SINGULARITYENV
will overwrite the container's variable. We want to prepend so our host paths are found before the guest paths, which may contain incorrect information. The Parabricks container, for example, has the default installation location for drives as part of its LD_LIBRARY_PATH
variable, which is incorrect for Cheaha.
export SINGULARITYENV_LD_LIBRARY_PATH="$CUDA_LIB_PATHS:\$LD_LIBRARY_PATH"
Note the leading slash on \$LD_LIBRARY_PATH
. This ensures the $
is preserved until it is inside the container and that \$LD_LIBRARY_PATH
is not expanded on the host machine, but rather inside the container. The overall effect is prepending the container's LD_LIBRARY_PATH
variable.
Overall, this ensures that applications within the container know where to look for shared libraries, including drivers, CUDA toolkit and cuDNN.
Bind each of the paths to the container. Multiple paths can be provided to the same --bind
using a comma-separated-list. To simplify, transform the :
separated list in CUDA_LIB_PATHS
into a ,
separated list using sed
, like the following. We also append ::ro
to each entry to ensure they are interpreted as read-only. The format for binds is source:destination:option
. If destination
is left blank, it is set equal to source
. We want this because we have prepended these exact paths to LD_LIBRARY_PATH
within the container.
CUDA_LIB_CSL=$(echo $CUDA_LIB_PATHS | sed -e s/:/::ro,/g)
We must bind the paths to the container using the --bind
argument. When using singularity run
or singularity exec
, include the following argument as well. This ensures the paths are available to and reachable by applications within the container.
--bind $CUDA_LIB_CSL
Prepending the driver, CUDA toolkit and cuDNN paths to the containers LD_LIBRARY_PATH
ensures applications know where to look for shared libraries. Binding those paths to the container ensures those paths are available to applications within the container. Both steps are necessary for container CUDA applications to function on Cheaha.
We will want to put this prominently in relation to Singularity, Containers, and GPUs.
Specific locations could be tricky, but this one is an important pitfall to cover.
One specific location could be https://docs.rc.uab.edu/workflow_solutions/getting_containers/#containers-on-cheaha since it's more Cheaha specific than Cloud or K8s
What would you like to see added?
We need to document a generic example of how to pass the host environment LD_LIBRARY_PATH and append it within a container-defined LD_LIBRARY_PATH. Can be added to this page https://docs.rc.uab.edu/workflow_solutions/getting_containers/#containers-on-cheaha.
For instance, executing the Parabricks container requires a CUDA library. In order to see host GPUs inside the container, we need to append the CUDA LD_LIBRARY_PATH to the containerized LD_LIBRARY_PATH.
The CUDA LD_LIBRARY_PATH in Cheaha is in "/cm/local/apps/cuda/libs/current/lib64". Before appending the CUDA LD_LIBRARY_PATH, the environment inside the container is,
After appending the CUDA LD_LIBRARY_PATH to the containerized LD_LIBRARY_PATH we can see that the initial containerized definition is preserved by appending,
In order to execute the container we need to bind the path of the CUDA library as well as set the LD_LIBRARY_PATH,
$ singularity run --nv -B /cm/local/apps/cuda/libs/current/lib64 parabricks.sif /bin/pbrun fq2bam ...
If required link the Parabricks page:https://docs.rc.uab.edu/education/case_studies/#parabricks-testing-on-pascalnodes-and-amperenodes-on-cheaha.
For reference see this github post to see how to append LD_LIBRARY_PATH to the containers LD_LIBRARY_PATH: https://github.com/apptainer/singularity/issues/5781