Open tcaduser opened 2 years ago
Here is the debug repair run:
DEBUG:auditwheel.repair:Grafting: /usr/local/lib/libsqlite3.so.0.8.6 -> devsim.libs/libsqlite3-10f9d6e9.so.0.8.6
DEBUG:auditwheel.repair:Grafting: /lib64/libquadmath.so.0.0.0 -> devsim.libs/libquadmath-96973f99.so.0.0.0
DEBUG:auditwheel.repair:Preserved rpath entry $ORIGIN
DEBUG:auditwheel.repair:Preserved rpath entry $ORIGIN/../lib
DEBUG:auditwheel.repair:rpath entry $ORIGIN/../../../../lib points outside the wheel -- discarding it.
Is there any way to change this behavior?
For what it's worth, this is my workaround for preventing the tool from removing preset rpath values.
from auditwheel.main import main
import auditwheel.repair
import sys
if hasattr(auditwheel.repair, '_is_valid_rpath') and callable(getattr(auditwheel.repair, '_is_valid_rpath')):
auditwheel.repair._is_valid_rpath = lambda x, y, z : True
else:
raise RuntimeError("Something is wrong with auditwheel customization")
if __name__ == '__main__':
sys.exit(main())
Please provide more details on your layout,
discarding things pointing outside the wheel as noted in the message is a sane behavior.
Hi @mayeut,
In an Anaconda environement, the application works correctly, because the python binary encodes an RPATH which can find the needed blas/lapack library:
find $CONDA_PREFIX -iname devsim_py3.so ; find $CONDA_PREFIX -iname libmkl_rt.so ; objdump -x `which python` | grep RPATH
/home/juan/miniconda3/envs/devsim_test/lib/python3.10/site-packages/devsim/devsim_py3.so
/home/juan/miniconda3/envs/devsim_test/lib/libmkl_rt.so
RPATH $ORIGIN/../lib
In venv
environments, the python binary from the Ubuntu 22.04 python 3 installation does not have an RPATH:
$ find $VIRTUAL_ENV -iname devsim_py3.so ; find $VIRTUAL_ENV -iname 'libmkl_rt*so*' ; objdump -x `which python` | grep RPATH
/home/juan/myapp/lib/python3.10/site-packages/devsim/devsim_py3.so
/home/juan/myapp/lib/libmkl_rt.so.2
Therefore the only RPATH
that can be relied upon to find the math library is that of the shared library:
$ objdump -x `find $VIRTUAL_ENV -iname devsim_py3.so`| grep RPATH
RPATH $ORIGIN:$ORIGIN/../lib:$ORIGIN/../../../../lib:$ORIGIN/../devsim.libs
where the 3rd RPATH
entry can be used to find the math library. Note that $ORIGIN/../../../../lib
is a valid location within a Python venv
environment or an Anaconda Python environment.
Note that:
libmkl*so*
installation.mkl
package is at least 700 MB, and I don't want to have Anaconda users of my application have 2 installations of the mkl
(One from Anaconda numpy
and one from pip
).I suggest not discarding things pointing inside of a valid Python environment may be a sane behavior.
Unfortunately in my experience, Ubuntu does not have a proper RPATH
on their venv
installed python interpreter. This also happens on macOS
, where the Homebrew provided python3 installation also does not encode an RPATH
into the interpreter.
I suggest not discarding things pointing inside of a valid Python environment may be a sane behavior.
IMHO, it's still sane to discard.
You're basing your RPATH
trick on an undefined behavior (one observed in specific conditions).
there's an overlap with what's described here and this comment: https://github.com/pypa/auditwheel/issues/391#issuecomment-1272349511
For PyPI distribution (I don't know enough about conda to give specifics here), you can use importlib.metadata to locate the mkl shared object you want to load if intel doesn't provide this in its MKL distribution package.
MKL is several hundred megabytes long and I don't want to force it on users if they want to use another ABI compliant implementation of the BLAS/LAPACK library.
Many python environments put "RPATH=$ORIGIN/../lib" on the Python executable. I think my approach is a valid workaround to the fact that Ubuntu Python and Homebrew Python does not do that, but Anaconda Python does.
The fact that numpy is using a similar trick for their libraries indicates there may be some validity to my approach.
$ objdump -x $CONDA_PREFIX/lib/python3.9/site-packages/numpy/linalg/lapack_lite.cpython-39-x86_64-linux-gnu.so | grep RPATH
RPATH $ORIGIN/../../../..
If there is no official support, I will continue to override the auditwheel
behavior in order to get the manylinux
certification I need.
One additional issue, is that the MKL library is runtime dispatched to the specific processor. So the libmkl_rt.so dispatches to any of a large number of shared libraries, depending on the user's specific cpu. So it is not possible to just package one shared library out of the entire MKL package.
conda uses its own packaging system & has been designed to handle native dependencies natively, so, taking numpy as an example since you introduced it, you'll see that the conda package & the PyPI package are not the same:
conda install numpy
objdump -x conda310/lib/python3.10/site-packages/numpy/linalg/lapack_lite.cpython-310-x86_64-linux-gnu.so | grep RPATH
RPATH $ORIGIN/../../../..
conda remove numpy
pip install numpy==1.23.4
objdump -x conda310/lib/python3.10/site-packages/numpy/linalg/lapack_lite.cpython-310-x86_64-linux-gnu.so | grep RPATH
RPATH $ORIGIN/../../numpy.libs
The conda specific package links against conda provided mkl with numpy having a well defined RPATH in the conda ecosystem. When using the PyPI package, numpy doesn't have an RPATH that crosses the wheel boundaries and provides its own implementation of math libraries through OpenBLAS (there's a discussion around this to provide the OpenBLAS dependency as a wheel reusable by other packages rather than each package embedding its own copy).
For conda, you might want to look at conda-forge to provide your package in a way that's specific to conda (including RPATH), which will also get rid of the sqlite/libquadmath dependencies bundled in your PyPI distribution.
For the PyPI distribution, patching your __init__.py
to lookup for those libraries is probably the best way to go, something along the lines of (I only looked at MKL because I can pip install mkl
):
import sys
#TODO:
#https://stackoverflow.com/questions/6677424/how-do-i-import-variable-packages-in-python-like-using-variable-variables-i
#imported = getattr(__import__(package, fromlist=[name]), name)
if sys.version_info[0] == 3:
from ctypes import cdll
import importlib.metadata as importlib_metadata
try:
# this one will work if mkl is installed in the system site package or user site package from PyPI
# the user probably might not want to install 700 MB in every virtual environment.
# The RPATH trick is useless in this case. It simply does not work:
# python -m pip install --user mkl
# python -m venv --system-site-packages devsim-venv
# source devsim-venv/bin/activate
# pip install devsim
files = importlib_metadata.files('mkl')
for file in files:
if file.name.startswith("libmkl_rt.so."):
cdll.LoadLibrary(file.locate())
except importlib_metadata.PackageNotFoundError:
# This fallback is for conda should you choose not to publish a specific conda package
# or want the PyPI distribution to work in a conda environment (where mkl won't be found by importlib_metadata).
from pathlib import Path
lib_dir = Path(__file__).resolve(strict=True).parent.parent.parent.parent
try:
mkl_path = next(lib_dir.glob("libmkl_rt.so.*"))
cdll.LoadLibrary(str(mkl_path))
except StopIteration:
pass
from .devsim_py3 import *
from .devsim_py3 import __version__
else:
raise ImportError('module not available for Python %d.%d please contact technical support' % sys.version_info[0:2])
In my opinion, auditwheel
needs a way to whitelist RPATH values. It really should be up to the developer to do this without hacking the auditwheel
.
I would be willing to create a pull request to provide this whitelisting behavior. However, I am getting the impression that it would get immediately rejected.
Thanks for your suggested solution, but it is not really applicable to the usage model to my software, as the modules are being loaded directly by the C++ extension using dlopen
on macOS
and Linux
. I was previously iterating though paths based on $CONDA_PREFIX and $VIRTUAL_LIB, but this should not be necessary, especially because this approach is unmaintainable.
I was iterating through potential paths, but it preferable use dlopen
and have the user set the appropriate paths using the approach I describe here:
https://github.com/devsim/devsim/commit/96ff966ac47919cba01786050a4a60a04fbc51ec#diff-d975bf659606195d2165918f93e1cf680ef68ea3c9cab994f033705fea8238b2
If users decide not to use a conda or virtual environment, they are free to use DEVSIM_MATH_LIBS
environment variable to set a :
delimited set of relative or absolute paths to their non standard installation.
With respect to the quadmath and sqlite suggestions:
sqlite
is a standard part of the Python library, and so I am satisfied if auditwheel
decides to provide a small copy of the shared libraryquadmath
is somewhat of a standard system library. I am satisfied if auditwheel
decides to provide a small copy in the shared library.auditwheel
was content not to provide a copy of the zlib
shared library as it is a stable library.Thanks for providing an explanation of the numpy
behavior. It would really make sense that for numpy
to use an approach similar to mine and reference an available math library using an RPATH similar to mine.
I am not interested in providing a conda
package for my software. pip
package management is great, and auditwheel
allows me to create an portable implementation that can be used across many versions of Python 3.x. This is without having to recompile for multiple python versions or linux systems.
I would be willing to create a pull request to provide this whitelisting behavior. However, I am getting the impression that it would get immediately rejected.
If other maintainers disagree with me, I might be convinced otherwise. cc @lkollar
Thanks for your suggested solution, but it is not really applicable to the usage model to my software, as the modules are being loaded directly by the C++ extension using dlopen on macOS and Linux.
The code shared works as-is for MKL when the patched RPATH is removed from devsim.
You can also set DEVSIM_MATH_LIBS
if not already set by the user:
import sys
#TODO:
#https://stackoverflow.com/questions/6677424/how-do-i-import-variable-packages-in-python-like-using-variable-variables-i
#imported = getattr(__import__(package, fromlist=[name]), name)
if sys.version_info[0] == 3:
import os
if "DEVSIM_MATH_LIBS" not in os.environ:
import importlib.metadata as importlib_metadata
libs = []
try:
# this one will work if mkl is installed in the system site package or user site package from PyPI
# the user probably might not want to install 700 MB in every virtual environment.
# The RPATH trick is useless in this case. It simply does not work:
# python -m pip install --user mkl
# python -m venv --system-site-packages devsim-venv
# source devsim-venv/bin/activate
# pip install devsim
files = importlib_metadata.files('mkl')
for file in files:
if file.name.startswith("libmkl_rt.so."):
libs.append(str(file.locate()))
break
except importlib_metadata.PackageNotFoundError:
# This fallback is for conda should you choose not to publish a specific conda package
# or want the PyPI distribution to work in a conda environment (where mkl won't be found by importlib_metadata).
from pathlib import Path
lib_dir = Path(__file__).resolve(strict=True).parent.parent.parent.parent
try:
mkl_path = next(lib_dir.glob("libmkl_rt.so.*"))
libs.append(str(mkl_path))
except StopIteration:
pass
libs.extend(("libopenblas.so", "liblapack.so", "libblas.so"))
os.environ["DEVSIM_MATH_LIBS"] = ":".join(libs)
from .devsim_py3 import *
from .devsim_py3 import __version__
else:
raise ImportError('module not available for Python %d.%d please contact technical support' % sys.version_info[0:2])
I agree with @mayeut. Hard coding the relative RPATH will not work reliably in all environments (what happens with --user
or --target
installations, etc.). This kind of behaviour can also cause issues with other extensions which also dynamically load libmkl
, as only one version will get loaded and that might not be ABI compatible with both libraries. This is why auditwheel grafts dependencies into the wheel and ensures that no other library will share that dependency. Circumventing this inside auditwheel adds complexity, and goes against the manylinux spec.
Grafting a 700MB library might not be desirable, but the alternative's like @mayeut's example should be able to help.
The grafting is not happening, since I am using dlopen
. Even if the grafting were to occur, the front mkl shared object is pretty small. I would think what would happen is that all of the other platform specific DLL's would be missing since they are dispatched individually by the front end object.
$ ls -lh libmkl_rt.so.2
-rwxr-x--x 1 juan juan 14M Nov 28 19:38 libmkl_rt.so.2
$ ldd libmkl_rt.so.2
linux-vdso.so.1 (0x00007fff6d9ff000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe16aafe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe16a8d6000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe16b604000)
$ du -ms *.so* | sort -nr
71 libmkl_core.so.2
64 libmkl_avx512.so.2
63 libmkl_intel_thread.so.2
51 libmkl_avx.so.2
49 libmkl_mc3.so.2
48 libmkl_avx2.so.2
47 libmkl_mc.so.2
41 libmkl_def.so.2
39 libmkl_tbb_thread.so.2
37 libmkl_pgi_thread.so.2
30 libmkl_gnu_thread.so.2
28 libmkl_sequential.so.2
21 libmkl_intel_lp64.so.2
17 libmkl_intel_ilp64.so.2
17 libmkl_gf_lp64.so.2
16 libmkl_vml_avx.so.2
15 libmkl_vml_mc.so.2
15 libmkl_vml_avx2.so.2
14 libmkl_vml_mc3.so.2
14 libmkl_vml_mc2.so.2
14 libmkl_vml_avx512.so.2
14 libmkl_rt.so.2
14 libmkl_gf_ilp64.so.2
9 libomptarget.sycl.wrap.so
9 libomptarget.rtl.x86_64.so
9 libomptarget.rtl.opencl.so
9 libomptarget.rtl.level0.so
9 libmkl_vml_def.so.2
8 libmkl_vml_cmpt.so.2
8 libmkl_scalapack_lp64.so.2
8 libmkl_scalapack_ilp64.so.2
3 libtbb.so.12.7
3 libtbb.so.12
3 libtbb.so
3 libiomp5.so
.
.
.
The mkl
functions are a stable ABI for BLAS/LAPACK that have been around for several decades. It is Intel's fault if they do something to break it.
I will continue with my hack, since there is no solution to the problem.
I am unsure why, after notifying users they need to be using a conda
or venv
environment, it is not possible to preserve the proper RPATH
in the wheel.
Hello,
Using the manylinux2014 image to build my application with auditwheel, the repair is transforming the RPATH from:
to
Where it relocates zlib and sqlite into the devsim.libs directory.
Is there any way to prevent the tool from removing the trailing path? It is necessary to have this in my search path for a
dlopen
math library. Without this, my application cannot find a library since the$VIRTUAL_ENV/bin/python
did not set its own RPATH.