uabrc / uabrc.github.io

UAB Research Computing Documentation
https://docs.rc.uab.edu
21 stars 12 forks source link

Pass the host environment LD_LIBRARY_PATH within a Singularity container #572

Open Premas opened 1 year ago

Premas commented 1 year ago

What would you like to see added?

We need to document a generic example of how to pass the host environment LD_LIBRARY_PATH and append it within a container-defined LD_LIBRARY_PATH. Can be added to this page https://docs.rc.uab.edu/workflow_solutions/getting_containers/#containers-on-cheaha.

For instance, executing the Parabricks container requires a CUDA library. In order to see host GPUs inside the container, we need to append the CUDA LD_LIBRARY_PATH to the containerized LD_LIBRARY_PATH.

The CUDA LD_LIBRARY_PATH in Cheaha is in "/cm/local/apps/cuda/libs/current/lib64". Before appending the CUDA LD_LIBRARY_PATH, the environment inside the container is,

$ singularity exec parabricks.sif printenv LD_LIBRARY_PATH
/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs

After appending the CUDA LD_LIBRARY_PATH to the containerized LD_LIBRARY_PATH we can see that the initial containerized definition is preserved by appending,

$ export SINGULARITYENV_LD_LIBRARY_PATH="/cm/local/apps/cuda/libs/current/lib64:\$LD_LIBRARY_PATH"
$ singularity exec parabricks.sif printenv LD_LIBRARY_PATH
/cm/local/apps/cuda/libs/current/lib64:$LD_LIBRARY_PATH:/.singularity.d/libs

In order to execute the container we need to bind the path of the CUDA library as well as set the LD_LIBRARY_PATH,

$ singularity run --nv -B /cm/local/apps/cuda/libs/current/lib64 parabricks.sif /bin/pbrun fq2bam ...

If required link the Parabricks page:https://docs.rc.uab.edu/education/case_studies/#parabricks-testing-on-pascalnodes-and-amperenodes-on-cheaha.

For reference see this github post to see how to append LD_LIBRARY_PATH to the containers LD_LIBRARY_PATH: https://github.com/apptainer/singularity/issues/5781

wwarriner commented 1 year ago

Important!

Some of the info below may be obsolete, as it is based on the old cudaX.Y modules. We will need to update this for the new CUDA/X.Y.Z modules.

LD_LIBRARY_PATH looks to be of the form:

/share/apps/rc/software/CUDA/11.6.0/nvvm/lib64:/share/apps/rc/software/CUDA/11.6.0/extras/CUPTI/lib64:/share/apps/rc/software/CUDA/11.6.0/lib

Old info

Here is my (hopefully?) complete understanding. When we write these docs, let's follow through the entire process with a specific example. We'll also need to include the detail of the --nv flag (and its successor in more recent singularity versions >3.5).

Overview

Pre-built containerized CUDA applications need the following:

  1. Driver (??) libraries (includes libcuda.so) [always required]
  2. CUDA Toolkit libraries (includes libcudart.so) [may be included]
  3. cuDNN library (for deep learning) [may be included]

The driver will always be required for a containerized application, because the container can't assume anything about the hardware on the system. Some containers may also need the locally installed version of CUDA toolkit or cuDNN library.

Getting all necessary paths

The Cheaha path for Nvidia drivers is /cm/local/apps/cuda/libs/current/lib64. Each of the above libraries has some path on Cheaha which must be made available to the container in two ways. The paths may be discovered using tr ':' '\n' <<< "$LD_LIBRARY_PATH". Generally, look for anything with cuda in the path. Put all of these paths into a shell variable separated by : character, as in the following specific example.

CUDA_LIB_PATHS=/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/cuda11.4/toolkit/11.4.2/targets/x86_64-linux/lib`

Prepending LD_LIBRARY_PATH

Prepend the variable to the container's LD_LIBRARY_PATH variable like the following. Note that any variable starting with SINGULARITYENV will overwrite the container's variable. We want to prepend so our host paths are found before the guest paths, which may contain incorrect information. The Parabricks container, for example, has the default installation location for drives as part of its LD_LIBRARY_PATH variable, which is incorrect for Cheaha.

export SINGULARITYENV_LD_LIBRARY_PATH="$CUDA_LIB_PATHS:\$LD_LIBRARY_PATH"

Note the leading slash on \$LD_LIBRARY_PATH. This ensures the $ is preserved until it is inside the container and that \$LD_LIBRARY_PATH is not expanded on the host machine, but rather inside the container. The overall effect is prepending the container's LD_LIBRARY_PATH variable.

Overall, this ensures that applications within the container know where to look for shared libraries, including drivers, CUDA toolkit and cuDNN.

Binding library paths

Bind each of the paths to the container. Multiple paths can be provided to the same --bind using a comma-separated-list. To simplify, transform the : separated list in CUDA_LIB_PATHS into a , separated list using sed, like the following. We also append ::ro to each entry to ensure they are interpreted as read-only. The format for binds is source:destination:option. If destination is left blank, it is set equal to source. We want this because we have prepended these exact paths to LD_LIBRARY_PATH within the container.

CUDA_LIB_CSL=$(echo $CUDA_LIB_PATHS | sed -e s/:/::ro,/g)

We must bind the paths to the container using the --bind argument. When using singularity run or singularity exec, include the following argument as well. This ensures the paths are available to and reachable by applications within the container.

--bind $CUDA_LIB_CSL

Conclusion

Prepending the driver, CUDA toolkit and cuDNN paths to the containers LD_LIBRARY_PATH ensures applications know where to look for shared libraries. Binding those paths to the container ensures those paths are available to applications within the container. Both steps are necessary for container CUDA applications to function on Cheaha.

wwarriner commented 10 months ago

We will want to put this prominently in relation to Singularity, Containers, and GPUs.

Specific locations could be tricky, but this one is an important pitfall to cover.

mdefende commented 10 months ago

One specific location could be https://docs.rc.uab.edu/workflow_solutions/getting_containers/#containers-on-cheaha since it's more Cheaha specific than Cloud or K8s