mesonbuild / meson-python

Meson PEP 517 Python build backend
https://mesonbuild.com/meson-python/
MIT License
129 stars 69 forks source link

binutils 2.43.50: Segmentation fault in test_local_lib #698

Open hroncok opened 1 week ago

hroncok commented 1 week ago

Hello.

I am trying to build and test meson-python with Python 3.14 in Fedora.

I see a strange Segmentation fault in test_local_lib. I can reproduce it on Fedora Rawhide (42), but not on Fedora 39.

To reproduce:

$ podman run --rm -ti fedora:rawhide /usr/bin/bash  # or docker
# dnf install uv git-core cmake python3.14-devel gcc patchelf gdb
...
# git clone https://github.com/mesonbuild/meson-python.git
...
# cd meson-python
# uv venv --python=python3.14 venv  # or regular venv
# . venv/bin/activate
# uv pip install ninja .[test]
Using Python 3.14.0a1 environment at venv
   Built meson-python @ file:///meson-python
Resolved 15 packages in 1.90s
   Built coverage==7.6.4
Prepared 11 packages in 1.06s
Installed 15 packages in 41ms
 + build==1.2.2.post1
 + coverage==7.6.4
 + cython==3.0.11
 + iniconfig==2.0.0
 + meson==1.6.0
 + meson-python==0.18.0.dev0 (from file:///meson-python)
 + ninja==1.11.1.1
 + packaging==24.1
 + pluggy==1.5.0
 + pyproject-hooks==1.2.0
 + pyproject-metadata==0.9.0
 + pytest==8.3.3
 + pytest-cov==5.0.0
 + pytest-mock==3.14.0
 + wheel==0.44.0
# python -m pytest -k test_local_lib
...
============================= test session starts ==============================
platform linux -- Python 3.14.0a1, pytest-8.3.3, pluggy-1.5.0
rootdir: /meson-python
configfile: pyproject.toml
testpaths: tests
plugins: cov-5.0.0, mock-3.14.0
collected 123 items / 122 deselected / 1 selected                              

tests/test_wheel.py F                                                    [100%]

=================================== FAILURES ===================================
________________________________ test_local_lib ________________________________

venv = <tests.conftest.VEnv object at 0x7fb577566f90>
wheel_link_against_local_lib = PosixPath('/tmp/pytest-of-root/pytest-5/test0/mesonpy-test-5tupkd1z/link_against_local_lib-1.0.0-cp314-cp314-linux_x86_64.whl')

    @pytest.mark.skipif(sys.platform not in {'linux', 'darwin'}, reason='Not supported on this platform')
    def test_local_lib(venv, wheel_link_against_local_lib):
        venv.pip('install', wheel_link_against_local_lib)
>       output = venv.python('-c', 'import example; print(example.example_sum(1, 2))')

tests/test_wheel.py:160: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/conftest.py:114: in python
    return subprocess.check_output([self.executable, *args]).decode()
/usr/lib64/python3.14/subprocess.py:472: in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14', '-c', 'import example; print(example.example_sum(1, 2))'],)
kwargs = {'stdout': -1}
process = <Popen: returncode: -11 args: ['/tmp/pytest-of-root/pytest-5/mesonpy-test-ve...>
stdout = b'', stderr = None, retcode = -11

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
        or pass capture_output=True to capture both.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14', '-c', 'import example; print(example.example_sum(1, 2))']' died with <Signals.SIGSEGV: 11>.

/usr/lib64/python3.14/subprocess.py:577: CalledProcessError
---------------------------- Captured stdout setup -----------------------------
Initialized empty Git repository in /meson-python/tests/packages/link-against-local-lib/.git/
+ meson setup /meson-python/tests/packages/link-against-local-lib /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7 -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/meson-python-native-file.ini
The Meson build system
Version: 1.6.0
Source dir: /meson-python/tests/packages/link-against-local-lib
Build dir: /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7
Build type: native build
Project name: link-against-local-lib
Project version: 1.0.0
C compiler for the host machine: cc (gcc 14.2.1 "cc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-4)")
C linker for the host machine: cc ld.bfd 2.43.50.20241014
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program python found: YES (/meson-python/venv/bin/python)
Found pkg-config: YES (/usr/bin/pkg-config) 2.3.0
Run-time dependency python found: YES 3.14
WARNING: Please do not define rpath with a linker argument, use install_rpath
or build_rpath properties instead.
This will become a hard error in a future Meson release.

Build targets in project: 2

link-against-local-lib 1.0.0

  User defined options
    Native files: /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/meson-python-native-file.ini
    b_ndebug    : if-release
    b_vscrt     : md
    buildtype   : release

Found ninja-1.11.1.git.kitware.jobserver-1 at /meson-python/venv/bin/ninja
+ /meson-python/venv/bin/ninja
[1/5] Compiling C object lib/libexample.so.p/examplelib.c.o
[2/5] Linking target lib/libexample.so
[3/5] Compiling C object example.cpython-314-x86_64-linux-gnu.so.p/examplemod.c.o
[4/5] Generating symbol file lib/libexample.so.p/libexample.so.symbols
[5/5] Linking target example.cpython-314-x86_64-linux-gnu.so
[1/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/lib/libexample.so
[2/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/example.cpython-314-x86_64-linux-gnu.so
=========================== short test summary info ============================
FAILED tests/test_wheel.py::test_local_lib - subprocess.CalledProcessError: Command '['/tmp/pytest-of-root/pytest-5/meso...
====================== 1 failed, 122 deselected in 3.01s =======================

# /tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14 -c 'import example; print(example.example_sum(1, 2))'
Segmentation fault (core dumped)

# gdb /tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14
(gdb) run -c 'import example; print(example.example_sum(1, 2))'
Program received signal SIGSEGV, Segmentation fault.
0x00007f05fbbe1294 in ?? ()
(gdb) bt
#0  0x00007f05fbbe1294 in ?? ()
#1  0x00007f05fbbfc310 in call_init (l=0x56104552bed0, argc=3, 
    argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-init.c:60
#2  call_init (l=0x56104552bed0, argc=3, argv=0x7fffbdb0b0f8, 
    env=0x7fffbdb0b118) at dl-init.c:26
#3  0x00007f05fbbfc42d in _dl_init (main_map=0x56104552bed0, argc=3, 
    argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-init.c:121
#4  0x00007f05fbbf9562 in __GI__dl_catch_exception (
    exception=exception@entry=0x0, 
    operate=operate@entry=0x7f05fbc030a0 <call_dl_init>, 
    args=args@entry=0x7fffbdb09fc0) at dl-catch.c:215
#5  0x00007f05fbc03039 in dl_open_worker (a=a@entry=0x7fffbdb09fc0)
    at dl-open.c:785
#6  0x00007f05fbbf94c3 in __GI__dl_catch_exception (
    exception=exception@entry=0x7fffbdb09fa0, 
    operate=operate@entry=0x7f05fbc02fb0 <dl_open_worker>, 
    args=args@entry=0x7fffbdb09fc0) at dl-catch.c:241
#7  0x00007f05fbc03424 in _dl_open (
    file=0x7f05fb0d94f0 "/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/lib64/python3.14/site-packages/example.cpython-314-x86_64-linux-gnu.so", 
    mode=<optimized out>, 
    caller_dlopen=0x7f05fb869e21 <_imp_create_dynamic+929>, 
    nsid=<optimized out>, argc=3, argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118)
    at dl-open.c:860
#8  0x00007f05fb47a9b4 in dlopen_doit () from /lib64/libc.so.6
#9  0x00007f05fbbf94c3 in __GI__dl_catch_exception (
    exception=exception@entry=0x7fffbdb0a1b0, 
    operate=0x7f05fb47a950 <dlopen_doit>, args=0x7fffbdb0a270)
    at dl-catch.c:241
#10 0x00007f05fbbf9619 in _dl_catch_error (objname=0x7fffbdb0a218, 
    errstring=0x7fffbdb0a220, mallocedp=0x7fffbdb0a217, 
    operate=<optimized out>, args=<optimized out>) at dl-catch.c:260
#11 0x00007f05fb47a4a3 in _dlerror_run () from /lib64/libc.so.6
#12 0x00007f05fb47aa6f in dlopen@GLIBC_2.2.5 () from /lib64/libc.so.6
#13 0x00007f05fb869e21 in _imp_create_dynamic ()
   from /lib64/libpython3.14.so.1.0
#14 0x00007f05fb77dccb in cfunction_vectorcall_FASTCALL ()
   from /lib64/libpython3.14.so.1.0
#15 0x00007f05fb75b05a in _PyEval_EvalFrameDefault ()
   from /lib64/libpython3.14.so.1.0
#16 0x00007f05fb77aec2 in object_vacall () from /lib64/libpython3.14.so.1.0
#17 0x00007f05fb7b441e in PyObject_CallMethodObjArgs ()
   from /lib64/libpython3.14.so.1.0
#18 0x00007f05fb7b35bd in PyImport_ImportModuleLevelObject ()
   from /lib64/libpython3.14.so.1.0
#19 0x00007f05fb75d7c9 in _PyEval_EvalFrameDefault ()
   from /lib64/libpython3.14.so.1.0
#20 0x00007f05fb82d3bb in PyEval_EvalCode () from /lib64/libpython3.14.so.1.0
#21 0x00007f05fb852050 in run_eval_code_obj () from /lib64/libpython3.14.so.1.0
#22 0x00007f05fb84af83 in run_mod () from /lib64/libpython3.14.so.1.0
#23 0x00007f05fb83d8ee in _PyRun_StringFlagsWithName.constprop.0 ()
   from /lib64/libpython3.14.so.1.0
#24 0x00007f05fb83d798 in _PyRun_SimpleStringFlagsWithName ()
   from /lib64/libpython3.14.so.1.0
#25 0x00007f05fb8647e4 in Py_RunMain () from /lib64/libpython3.14.so.1.0
#26 0x00007f05fb81c7ec in Py_BytesMain () from /lib64/libpython3.14.so.1.0
#27 0x00007f05fb4120c8 in __libc_start_call_main () from /lib64/libc.so.6
#28 0x00007f05fb41218b in __libc_start_main_impl () from /lib64/libc.so.6
#29 0x0000561011e4f095 in _start ()
dnicolodi commented 1 week ago

Is this the only test that fails? As far as I can tell, Python segfaults when executing the extension module initialization function. The module is extremely simple https://github.com/mesonbuild/meson-python/blob/main/tests/packages/link-against-local-lib/examplemod.c, thus I don't see how this is possible other than because of a bug in CPython. The other possibility is a bug in the packaging: if you end up with the Python headers and the installed Python having a different opinion about the shape of the PyModuleDef structure. Or something like this.

hroncok commented 1 week ago

Is this the only test that fails?

Yes. Another one fails with https://github.com/pytest-dev/pytest-mock/issues/468

hroncok commented 1 week ago

Huh, I can even reproduce this with Python 3.13.0 and 3.12.7. Possibly this is a problem in binutils etc.

hroncok commented 1 week ago

I can reproduce the crash with binutils 2.43.50 but not with binutils 2.43.1.

I'll take that to Fedora's binutils maintainer.

Should I keep this open or close it?

hroncok commented 1 week ago

https://bugzilla.redhat.com/show_bug.cgi?id=2321588

rgommers commented 1 week ago

Should I keep this open or close it?

Looks unrelated to meson-python, so it'd make sense to close this. If you prefer to keep it open for some days until you receive a reply on the binutils bug report, that seems fine as well.

hroncok commented 1 week ago

For the record, the binutils folks say this is a problem in patchelf. They are also quite determined that patchelf cannot be supported and would rather see meson-python utilize the final -Wl,-rpath=… option when building the extension module.

rgommers commented 1 week ago

Reopening to keep it visible, since it doesn't sound like a fix in either binutils or patchelf is in the work just yet.

They are also quite determined that patchelf cannot be supported and would rather see meson-python utilize the final -Wl,-rpath=… option when building the extension module.

Using the final -Wl,-rpath doesn't seem possible, since meson-python isn't actually building the extension module - meson is. And the package author (who could add an rpath argument to the package itself) doesn't know where and how meson-python will vendor the shared library into the wheel.

If the problem is RPATH rewriting though, this isn't just going to show up in this test case (which is a little niche and for a scenario that possibly is unused in the real world so far - not sure). auditwheel is doing the same when it vendors external shared libraries into wheels distributed on PyPI. That isn't going to show up in bug reports very soon only because auditwheel is usually run in a manylinux container that doesn't have a recent binutils. But that's an important use case for patchelf. And Nix will need it as well I'm sure.

It's still a little unclear to me what triggers the bug exactly, but it seems like this has to be fixed either in patchelf or in binutils.

I just read through the whole thread at https://bugzilla.redhat.com/show_bug.cgi?id=2321588. A few comments:

rgommers commented 1 week ago

Also, thanks for trying to sort this out @hroncok! Doesn't look like an easy conversation.

hroncok commented 1 week ago

-Wl,-rpath=/final/install/location

technically, this path is relative, so we don't have that exact problem. If meson-python could "tell" meson to use a particular path, that should work, no?

eli-schwartz commented 1 week ago

There appears to be a kind of weirdly layered confusion going on here across multiple issue trackers.

"meson-python" happens to use patchelf, which uncovered a bug in the (uncoordinated) interaction between binutils snapshots (?) used distro-wide in fedora, and patchelf, a program widely used in various contacts. As noted in the fedora ticket, binutils has broken the PyPy build as well.

This bug doesn't need pip to replicate it, I'm sure. You could use python -m build, available on PyPI as "build", instead of pip install. It will create a wheel for you. And "build" assumes developer intent already, which means no passing --verbose to pip.

Patchelf is needed by literally anyone building wheels with C libraries for upload to PyPI and usage by basically all Linux users on any distro. Fedora and its derivatives are actually quite popular for this due to GCC Toolset, which allows you to use new GCC versions with older glibc... So having this broken on fedora specifically, seems a bit unfortunate!

Why is it needed, you ask? Well, it's needed because uploading to PyPI is a subcategory of building standalone binaries, so there's a program whose sole purpose is to modify your wheels, copy system libraries into the wheel, use patchelf to retarget everything to use relative rpaths that are part of the wheel layout, and upload the now standalone python modules.

This is a serious use case and complaining that meson-python should just not do that when using its own libraries, is missing the point (even though as a meson maintained, not a meson-python maintainer, I am sympathetic to this argument). Less blame, more investigating whether patchelf and binutils can get along, please.

eli-schwartz commented 1 week ago

The problem is fundamentally about (a) the way Python wheels are standardized and are not containing a libdir location for shared libraries, and (b) the need to make Python wheels portable and installable into venv's that don't have a predefined absolute install path on the user's OS. This makes relocating shared libraries a necessary step, and it's common to tooling for Python wheels, Nix packages, Conda packages, etc. It looks to me like saying "use -Wl,-rpath=/final/install/location" misses that key point.

@rgommers, note that this actually isn't about being relocatable. It's about changing the install layout at all. Being relocatable just means you need an rpath string using $ORIGIN (this is a dynamic loader variable) and the knowledge at the time of linking, what relocatable layout you need. You can then just inject the string value in LDFLAGS.

It doesn't help because you will potentially still have other unwanted rpath entries, and you can't handle library dependencies that aren't part of meson.build -- that's why auditwheel uses patchelf too, isn't it?

And actually injecting LDFLAGS is difficult to do robustly since if you do it in the environment it will be ignored when the user specified a native file, and if you do it via a native file you scribble all over the user LDFLAGS and the user native files.

hroncok commented 6 days ago

Less blame, more investigating whether patchelf and binutils can get along, please.

I am not blaming anybody here. I am merely trying to solve this problem.

I am well aware that even if meson-python stops using patchelf, we will have this problem with auditwheel etc.

rgommers commented 6 days ago

@rgommers, note that this actually isn't about being relocatable. It's about changing the install layout at all.

You're right. I never encountered any other reasons for changing the install layout, so in my mind the two meant roughly the same thing.

Being relocatable just means you need an rpath string using $ORIGIN

Yes indeed. I just wrote docs for using shared libraries in gh-700, and for internal ones it starts with explaining how to use $ORIGIN. Being able to do so is relatively rare though, since shared libraries that are only meant for being included in a Python wheel are quite uncommon. The more typical case is something like this:

c-or-cpp-lib/
  meson.build  # contains shared_library() or library()
  python-bindings/
    meson.build  # contains extension module linking against shared library
  other-lang-bindings/
    ...

In such cases, especially if the Python bindings are maintained by other people than the C/C++ core, it may not be acceptable to mess with how the C/C++ is compiled specifically to make Python wheel builds nicer. The failing test case at hand here is representative for that: the shared library goes to libdir, and meson-python is left to do the "vendoring" work a la auditwheel.

In gh-700 I'm also adding more test cases, including for the $ORIGIN case. One that is still missing is for an external shared library + auditwheel - that may be useful as well.