lmstudio-ai / venvstacks

Virtual environment stacks for Python
http://venvstacks.lmstudio.ai/
MIT License
174 stars 6 forks source link

Allow shared object loading across layers #38

Open ncoghlan opened 1 month ago

ncoghlan commented 1 month ago

Environment stacks on platforms other than Windows currently don't correctly support shared object (aka dynamic library) loading across different layers (Windows is different due to its reliance on os.add_dll_directory even within a single virtual environment).

It should be possible to resolve this limitation by:

  1. Adding a new share/venv/dynlib folder within each non-Windows environment layer which contains symlinks to all of the shared objects found under the site-packages directory that aren't specifically marked as being Python extension modules
  2. Replacing the symlink to the underlying Python implementation in each non-Windows layered environment with a wrapper script that sets the shared object loading environment variables appropriately, and then uses exec -a to invoke the underlying base Python runtime while still having sys.executable refer to the wrapper script inside the virtual environment

Additional implementation notes:


Background

Consider the following virtual environment with pytorch installed from PyPI:

(dynlib_example) ~/devel/dynlib_example$ pip list | grep torch
torch                    2.5.0
torchaudio               2.5.0
torchvision              0.20.0

The libtorch.so extension module within that environment includes relative load paths for several potential nVidia dependencies:

(dynlib_example) ~/devel/dynlib_example$ readelf -d lib/python3.12/site-packages/torch/lib/libtorch.so  | grep 'R.*PATH'
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../../nvidia/cublas/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_runtime/lib:$ORIGIN/../../nvidia/cudnn/lib:$ORIGIN/../../nvidia/cufft/lib:$ORIGIN/../../nvidia/curand/lib:$ORIGIN/../../nvidia/cusolver/lib:$ORIGIN/../../nvidia/cusparse/lib:$ORIGIN/../../nvidia/nccl/lib:$ORIGIN/../../nvidia/nvtx/lib:$ORIGIN]

This works because those nvidia libraries are installed into the same virtual environment:

(dynlib_example) ~/devel/dynlib_example$ ls lib/python3.12/site-packages/nvidia
cublas      cuda_nvrtc    cudnn  curand    cusparse     nccl       nvtx
cuda_cupti  cuda_runtime  cufft  cusolver  __init__.py  nvjitlink

In the context of venvstacks, this means that pytorch and the nVidia libraries must be installed as part of the same layer definition. Attempting to move the nVidia libaries lower in the stack (either to the base runtime layer, or to a separate framework layer if #18 is implemented) will fail, since the dynamic library loading will fail.

This is a reasonably common pattern, and one of the main reasons folks point out that the Python environment layering pattern implemented by venvstacks doesn't work in the general case: whereas Python extension module DLLs on Windows are able to make themselves dynamically discoverable with os.add_dll_directory, POSIX shared objects rely more heavily on relative paths that are fixed at module build time (and hence are only correct when the library and its dependencies are installed into the same target environment) and the LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH on macOS) setting, which needs to be configured prior to application startup (it can't be manipulated at runtime the way the Windows DLL search path can be).

If you're aware of the problem, it can be managed, but if you're not already aware of the possibility, the consequences of running into it can be utterly baffling to try and debug when all you have to work with is a cryptic shared object loading failure when Python attempts to import an extension module with a dynamically linked dependency that can't be resolved.

Finding shared objects to symlink

Simply searching for and symlinking all .so objects in a layered environment would result in a lot of pointless symlinks to Python binary extension modules that are only loaded directly after the interpreter finds them via sys.path.

https://github.com/lmstudio-ai/venvstacks/blob/main/misc/find_shared_libs.py proposes a better algorithm for that, which filters out the shared objects that specifically look like Python extension modules:

(dynlib_example) ~/devel/dynlib_example$ find . -name '*.so' | wc -l
61
(dynlib_example) ~/devel/dynlib_example$ ../venvstacks/misc/find_shared_libs.py . | wc -l
32

For this example environment:

(dynlib_example) ~/devel/dynlib_example$ ../venvstacks/misc/find_shared_libs.py .
lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-ff651d7f.so
lib/python3.12/site-packages/torchaudio/lib/_torchaudio_sox.so
lib/python3.12/site-packages/torchaudio/lib/_torchaudio.so
lib/python3.12/site-packages/torchaudio/lib/pybind11_prefixctc.so
lib/python3.12/site-packages/torchaudio/lib/libtorchaudio_sox.so
lib/python3.12/site-packages/torchaudio/lib/libctc_prefix_decoder.so
lib/python3.12/site-packages/torchaudio/lib/libtorchaudio.so
lib/python3.12/site-packages/pillow.libs/libopenjp2-05423b53.so
lib/python3.12/site-packages/torio/lib/_torio_ffmpeg6.so
lib/python3.12/site-packages/torio/lib/_torio_ffmpeg5.so
lib/python3.12/site-packages/torio/lib/_torio_ffmpeg4.so
lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg4.so
lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg6.so
lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg5.so
lib/python3.12/site-packages/nvidia/cuda_cupti/lib/libnvperf_target.so
lib/python3.12/site-packages/nvidia/cuda_cupti/lib/libpcsamplingutil.so
lib/python3.12/site-packages/nvidia/cuda_cupti/lib/libcheckpoint.so
lib/python3.12/site-packages/nvidia/cuda_cupti/lib/libnvperf_host.so
lib/python3.12/site-packages/triton/_C/libproton.so
lib/python3.12/site-packages/triton/_C/libtriton.so
lib/python3.12/site-packages/torchvision/image.so
lib/python3.12/site-packages/torchvision/_C.so
lib/python3.12/site-packages/torch/lib/libc10_cuda.so
lib/python3.12/site-packages/torch/lib/libtorch_cuda.so
lib/python3.12/site-packages/torch/lib/libtorch_python.so
lib/python3.12/site-packages/torch/lib/libtorch.so
lib/python3.12/site-packages/torch/lib/libcaffe2_nvrtc.so
lib/python3.12/site-packages/torch/lib/libtorch_cuda_linalg.so
lib/python3.12/site-packages/torch/lib/libshm.so
lib/python3.12/site-packages/torch/lib/libtorch_cpu.so
lib/python3.12/site-packages/torch/lib/libc10.so
lib/python3.12/site-packages/torch/lib/libtorch_global_deps.so

The torchvision case highlights the need for a library symlink exclusion mechanism in the layer specification syntax: the _C.so file is loaded via an explicit library loading call (relative to the Python file), so it shouldn't be symlinked into the dynamic library loading location. The generically named image.so shared library in that case also serves as an example of a case where it may necessary to resolve shared object naming conflicts between packages that are installed into the same layer (the initial proposal is to have naming conflicts trigger a fatal build error for that environment, with the exclusion mechanism then being used to pick which one gets linked).

Wrapping the Python runtime invocation

Both Linux and macOS should support the -a option to exec that allows execution of the correct Python binary while having sys.executable point at the wrapper script:

(dynlib_example) ~/devel/dynlib_example$ cat pyexec.sh
#!/bin/sh
exec -a "$PWD/pyexec.sh" bin/python3 "$@"
(dynlib_example) ~/devel/dynlib_example$ ./pyexec.sh -c "import sys; print(sys.executable)"
/home/acoghlan/devel/dynlib_example/pyexec.sh

The real script will do the full required "get the absolute path to this running script" dance rather than using $PWD, but this short snippet still illustrates the general approach needed to ensure invoked Python subprocesses still get the library path environment variable adjustments even if the parent process environment isn't passed to the subprocess (to avoid an ever growing environment variable, the environment variable adjustments will need to check that the directory of interest isn't already present).

For Linux, the search path environment variable to adjust is LD_LIBRARY_PATH, while on macOS it is DYLD_LIBRARY_PATH .