pypa / auditwheel

Auditing and relabeling cross-distribution Linux wheels.
Other
443 stars 145 forks source link

add "--page-size=65536" to patchelf invocation #251

Closed mattip closed 4 years ago

mattip commented 4 years ago

Aarch64 systems are configured with either 4k or 64k pages. A system with 64k pages cannot load a shared object that is built with 4k pages. It seems the manylinux arm64 docker images use 4k pages, while RHEL8/AArch6 use 64k pages, so manylinux2014 arm64 wheels build in the docker image cannot be used across systems. In numpy/numpy#16677, @vstinner suggested using the patchelf --page-size=65536 argument to change the section alignment. Would this be enough to fix the problem, or do we need to change the docker images to add -Wl,-z,max-page-size=0x10000 by default to the compiler?

adang1345 commented 4 years ago

I encountered the same page alignment issues in my project at https://github.com/intersystems/iknow. For me, forcing auditwheel to use patchelf --page-size=65536 was enough to fix the problem and resulted in binaries that I could load on RHEL7, which uses 64KiB pages. Currently, as a workaround, I am executing the following before invoking auditwheel in a manylinux2014_aarch64 container.

mv /usr/local/bin/patchelf /usr/local/bin/_patchelf
printf '#!/usr/bin/env bash\n\n_patchelf --page-size 65536 "$@"\n' >> /usr/local/bin/patchelf
chmod +x /usr/local/bin/patchelf

This workaround inserts the --page-size=65536 argument whenever auditwheel invokes patchelf.

mithrandi commented 4 years ago

https://github.com/numpy/numpy/issues/16677#issuecomment-649809124 suggests that patchelf cannot fully fix this. (And from what I can tell, the --page-size argument is not intended for attempting to change the page size). However this change should still be made so that patchelf does not break correctly-built binaries.

alex commented 4 years ago

Well. https://github.com/pyca/bcrypt/commit/8d35d8acb82b4600a367e692d1231a705552900c is what we did for bcrypt. It sounds like auditwheel can't solve this, and maybe we need to throw this into the CFLAGS off the docker image?

mattip commented 4 years ago

we need to throw this into the CFLAGS of the docker image

This sounds right to me. So this should be an issue/PR against the manylinux2014 ARM image to add export CFLAGS="$CFLAGS -Wl,-z,max-page-size=0x10000", correct? Can we move this issue to that repo?

alex commented 4 years ago

That seems correct to me, but I'm far from an expert.

On Wed, Aug 19, 2020 at 11:06 AM Matti Picus notifications@github.com wrote:

we need to throw this into the CFLAGS of the docker image

This sounds right to me. So this should be an issue/PR against the manylinux2014 ARM image to add export CFLAGS="$CFLAGS -Wl,-z,max-page-size=0x10000", correct? Can we move this issue to that repo?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pypa/auditwheel/issues/251#issuecomment-676483763, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBHGFI6SFZVUABBUVOLSBPS7VANCNFSM4OHDNAMA .

-- All that is necessary for evil to succeed is for good people to do nothing.

mithrandi commented 4 years ago

auditwheel would ideally still check and complain if the alignment is wrong.

reaperhulk commented 4 years ago

Throwing it in the CFLAGS of the docker image is a fragile fix since many projects may unknowingly clobber existing CFLAGS with their own. Python appends the flags it was built with to module compilation in addition to whatever CFLAGS are passed, so if we compile Python itself on the manylinux2014 images with -Wl,-z,max-page-size=0x10000 then it will reliably be passed for everything...unless I'm missing something 😄

alex commented 4 years ago

I think https://github.com/pypa/manylinux/blob/manylinux2014/docker/build_scripts/build_utils.sh#L31 is roughly where that change would happen.

njsmith commented 4 years ago

You could also potentially hack the compiler on the manylinux2014 images to enforce that (maybe by modifying the linker scripts?).

vstinner commented 4 years ago

Rather than hacking the compiler, would it be possible to write a script to validate that produced binaries have the expected minimum alignment?

mattip commented 4 years ago

So there are (at least) two issues: creation and validation. Both are important.

vstinner commented 4 years ago

Copy of https://github.com/numpy/numpy/issues/16677#issuecomment-676693957 comment:

So the issue is coming from Manylinux2014_aarch64 and its version of patchelf being slightly too old.

https://github.com/NixOS/patchelf/commit/0470d6921b5a3fe8e92e356c8e11d120dbbb06c0 This patch was submitted June 20th, and fixes the issues with AArch64. ManyLinux2014 is pulling the 0.11 tag release which is 3 weeks older. Updating ManyLinux2014 should fix this.

Originally posted by @geoffreyblake in https://github.com/numpy/numpy/issues/16677#issuecomment-676693957

and @mattip reply https://github.com/numpy/numpy/issues/16677#issuecomment-676735511 :

Thanks. Let's continue the discussion on the manylinux issue, since I think we have localized the problem to patchelf breaking binaries that are properly compiled with 64k pages, and patchelf is part of manylinux.

frenzymadness commented 4 years ago

I've tested the latest numpy wheels for Python 3.6 and 3.8 on aarch64 machine and everything seems to work well.

LOAD sections also seem to be aligned correctly.

# readelf -l venv/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-aarch64-linux-gnu.so 

Elf file type is DYN (Shared object file)
Entry point 0x298a0
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x000000000038ab08 0x000000000038ab08  R E    0x10000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_EH_FRAME   0x0000000000330830 0x0000000000330830 0x0000000000330830
                 0x000000000000bd44 0x000000000000bd44  R      0x4
  LOAD           0x000000000038e600 0x000000000039e600 0x000000000039e600
                 0x000000000001e194 0x000000000003e7b8  RW     0x10000
  TLS            0x000000000038e600 0x000000000039e600 0x000000000039e600
                 0x0000000000000000 0x0000000000000140  R      0x8
  GNU_RELRO      0x000000000038e600 0x000000000039e600 0x000000000039e600
                 0x0000000000001a00 0x0000000000001a00  R      0x1
  LOAD           0x0000000000400000 0x00000000003e0000 0x00000000003e0000
                 0x00000000000039e8 0x00000000000039e8  RW     0x10000
  NOTE           0x00000000004039c0 0x00000000003e39c0 0x00000000003e39c0
                 0x0000000000000024 0x0000000000000024  R      0x4
  DYNAMIC        0x0000000000410000 0x00000000003f0000 0x00000000003f0000
                 0x0000000000000220 0x0000000000000220  RW     0x8
  LOAD           0x0000000000410000 0x00000000003f0000 0x00000000003f0000
                 0x0000000000002e70 0x0000000000002e70  RW     0x10000

I think we can close this now.

mattip commented 4 years ago

Closing. Please reopen or open a new issue as needed.