pypa / auditwheel

Auditing and relabeling cross-distribution Linux wheels.
Other
433 stars 142 forks source link

issue with auditwheel repair #279

Open hroest opened 5 years ago

hroest commented 5 years ago

Issue

We have found that auditwheel repair processes some of our shared objects such that they are broken after processing. After processing a wheel built with

 auditwheel repair "$whl" -w wheelhouse/

and installing the resulting wheel, the python package is broken:

/tmp/# $PYBIN/bin/python -c "import pyopenms"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/python/cp37-cp37m/lib/python3.7/site-packages/pyopenms/__init__.py", line 65, in <module>
    raise e
  File "/opt/python/cp37-cp37m/lib/python3.7/site-packages/pyopenms/__init__.py", line 35, in <module>
    from .all_modules import *
  File "/opt/python/cp37-cp37m/lib/python3.7/site-packages/pyopenms/all_modules.py", line 1, in <module>
    from .pyopenms_1 import *
ImportError: /opt/python/cp37-cp37m/lib/python3.7/site-packages/pyopenms/.libs/libQt5Core-82077f0c.so.5.9.4: undefined symbol: stderr, version 6
# echo $PYBIN
/opt/python/cp37-cp37m/

Workaround

Our current workaround is to unpack the wheel and replace the libQt5Core.so library in the .libs folder manually with the system one in "/qt/lib/libQt5Core.so.5.9.4" which works for now. Similarly, if you replace the above library with the one used during the build, it works:

# ls -tlrh /qt/lib/libQt5Core.so.5.9.4
-rwxr-xr-x 1 root root 5.9M Nov 27 18:00 /qt/lib/libQt5Core.so.5.9.4
# ls -tlrh /tmp/libQt5Core-82077f0c.so.5.9.4 
-rwxr-xr-x 1 root root 6.2M Nov 30 16:25 /tmp/libQt5Core-82077f0c.so.5.9.4

curiously, the library is actually larger after processing with auditwheel.

Reproducing the issue

You can reproduce the issue using the following script https://github.com/OpenMS/OpenMS/blob/feature/py_release/src/pyOpenMS/dist-scripts/create-manylinux.sh - note that our build takes a while

Debug information

Image

We use the following image:

Digest: sha256:c4161c855cfe3f6e50c38a8e5a5eae839431542ea89fd00475a378ffcd8c9875
Status: Image is up to date for quay.io/pypa/manylinux1_x86_64:latest

# docker images | grep manylin | grep quay
quay.io/pypa/manylinux1_x86_64      latest              80957eca29a3        11 days ago         878MB

Show

# auditwheel show wheelhouse_tmp/pyopenms-2.4.0-cp37-cp37m-linux_x86_64.whl

pyopenms-2.4.0-cp37-cp37m-linux_x86_64.whl is consistent with the
following platform tag: "linux_x86_64".

The wheel references external versioned symbols in these system-
provided shared libraries: libpthread.so.0 with versions
{'GLIBC_2.2.5'}, libgcc_s.so.1 with versions {'GCC_3.0'}, libm.so.6
with versions {'GLIBC_2.2.5'}, libQt5Core.so.5 with versions
{'Qt_5.9', 'Qt_5'}, libc.so.6 with versions {'GLIBC_2.2.5'},
libstdc++.so.6 with versions {'GLIBCXX_3.4.6', 'GLIBCXX_3.4',
'CXXABI_1.3', 'CXXABI_1.3.1'}

The following external shared libraries are required by the wheel:
{
    "libOpenMS.so": "/openms-build-cp37-cp37m/lib/libOpenMS.so",
    "libOpenSwathAlgo.so": "/openms-build-cp37-cp37m/lib/libOpenSwathAlgo.so",
    "libQt5Core.so.5": "/qt/lib/libQt5Core.so.5.9.4",
    "libQt5Network.so.5": "/qt/lib/libQt5Network.so.5.9.4",
    "libSuperHirn.so": "/openms-build-cp37-cp37m/lib/libSuperHirn.so",
    "libc.so.6": "/lib64/libc-2.5.so",
    "libdl.so.2": "/lib64/libdl-2.5.so",
    "libgcc_s.so.1": "/lib64/libgcc_s-4.1.2-20080825.so.1",
    "libglib-2.0.so.0": "/lib64/libglib-2.0.so.0.1200.3",
    "libgthread-2.0.so.0": "/lib64/libgthread-2.0.so.0.1200.3",
    "libm.so.6": "/lib64/libm-2.5.so",
    "libpthread.so.0": "/lib64/libpthread-2.5.so",
    "librt.so.1": "/lib64/librt-2.5.so",
    "libstdc++.so.6": "/usr/lib64/libstdc++.so.6.0.8",
    "libz.so.1": "/contrib-build/lib/libz.so.1.2.11"
}

In order to achieve the tag platform tag "manylinux1_x86_64" the
following shared library dependencies will need to be eliminated:

libOpenMS.so, libOpenSwathAlgo.so, libQt5Core.so.5,
libQt5Network.so.5, libSuperHirn.so, libz.so.1
njsmith commented 5 years ago

Sounds like an auditwheel issue, so CC @ehashman.

To confirm:

After running auditwheel, you get a manylinux wheel that's broken. When you try to import it, it says: libQt5Core-82077f0c.so.5.9.4: undefined symbol: stderr, version 6

Then, you unpack the broken wheel, and do:

cp -f /qt/lib/libQt5Core.so.5.9.4 .../pyopenms/.libs/libQt5Core-82077f0c.so.5.9.4

And after this, it now can import regularly?

IIRC the only change we make to the .so files when vendoring them is to use patchelf to change the soname. (Elana, do I remember correctly?) I haven't heard of any bugs here recently, but patchelf is kind of black magic so it's plausible...

For context: .so files have two names. There's the actual name of the file on the disk. And then embedded inside the file, there's a string saying what the file thinks its name is supposed to be. Normally, these two names are supposed to be the same.

When we use cp to rename the file, that changes the name on disk. But it doesn't change the string embedded inside the file. When auditwheel copies the file, it tries to change both. So it sounds like something is going wrong with the change-the-embedded-file-name step.

The reason this matters is: well, if you're only using one package, it doesn't matter. The embedded name string doesn't do anything; only the filename is used. But if you have multiple packages, that each have their own copies of Qt, then the loader will sometimes be like "oh hey, this package is asking for the system libQt5Core, but I don't need to go find that, because i know I already loaded a libQt5Core earlier! I'll just use the one I already loaded". This check uses the name embedded in the file. So if we don't fix up the embedded name, it can cause obscure crashes later, only in certain complex configurations, when another package entirely accidentally ends up using your package's copy of libQt5Core.

ehashman commented 5 years ago

Thanks for the cc @njsmith, is it possible to move this issue to the auditwheel repo?

njsmith commented 5 years ago

I have no idea. I've seen github offering to let me move issues recently, but that bit of UI seems to be missing on this issue. It's mysterious to me.

ehashman commented 5 years ago

I'm not a manylinux admin so I can't transfer it :( @dstufft can you please move this issue to auditwheel?

hroest commented 5 years ago

And after this, it now can import regularly?

yes, that is correct.

IIRC the only change we make to the .so files when vendoring them is to use patchelf to change the soname. (Elana, do I remember correctly?) I haven't heard of any bugs here recently, but patchelf is kind of black magic so it's plausible...

I agree with the black magic part, but if I look at the file sizes it seems that more is going on than just changing the soname:

# ls -tlrh /qt/lib/libQt5Core.so.5.9.4
-rwxr-xr-x 1 root root 5.9M Nov 27 18:00 /qt/lib/libQt5Core.so.5.9.4
# ls -tlrh /tmp/libQt5Core-82077f0c.so.5.9.4 
-rwxr-xr-x 1 root root 6.2M Nov 30 16:25 /tmp/libQt5Core-82077f0c.so.5.9.4

so there is an additional 300kb in the file after patching it, so there must be quite a bit of things that got changed.

When we use cp to rename the file, that changes the name on disk. But it doesn't change the string embedded inside the file. When auditwheel copies the file, it tries to change both. So it sounds like something is going wrong with the change-the-embedded-file-name step.

Is there a way for me to see the exact command that auditwheel is issuing so that I can try and reproduce what it is doing?

njsmith commented 5 years ago

there is an additional 300kb in the file after patching it, so there must be quite a bit of things that got changed.

This doesn't necessarily indicate much. ELF binaries aren't designed to make in-place editing easy, so sometimes the easiest way for patchelf to do its work is to copy a large chunk while just modifying a small part.

njsmith commented 5 years ago

If you want to dig into this, readelf is a useful tool.

lkollar commented 1 year ago

I wonder if this was due to a bug in patchelf. @hroest can you still reproduce this?