teemtee / tmt

Test Management Tool
MIT License
79 stars 118 forks source link

coredump when mrack closes its business [py3.12] #2501

Closed lukaszachy closed 2 months ago

lukaszachy commented 8 months ago

I'm running latest tmt on Fedora-39 (tmt-1.29.0-1.fc39) where python-3.12 is used. Originally I had typo in the image name, but to make it more obvious it is an invalid one: tmt run prov -h beaker --image rhel-9999

There is no coredump with Fedora-38 (tmt-1.29.0-1.fc38) with python-3.11.

.....
plan failed

The exception was caused by 1 earlier exceptions

Cause number 1:

    Failed to create, response:
    ({'response': '\tNo distro tree matches Recipe\n\t<distroRequires><and><distro_name op="=" value="rhel-9999"/><distro_variant op="=" value="BaseOS"/><distro_arch op="=" value="x86_64"/></and></distroRequires>'}, {'name': 'rhel-9999-x86_64', 'distro': 'rhel-9999', 'os': 'rhel-9999', 'group': 'linux', 'meta_distro': False, 'arch': 'x86_64', 'variant': 'BaseOS', 'ks_meta': None, 'retention_tag': 'audit', 'product': '[internal]', 'whiteboard': 'tmt-112-FKVqCBUR', 'priority': 'Normal', 'tasks': [{'name': '/distribution/dummy', 'role': 'STANDALONE'}], 'ks_append': [], 'hostRequires': None, 'distro_tags': [], 'host_id': 'rhel-9999-x86_64'})
Segmentation fault (core dumped)

And the core dump:

$ coredumpctl gdb
Core was generated by `/usr/bin/python3 -sP /usr/bin/tmt run prov -h beaker --image rhel-9999'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1247e6fa11 in teedataobject_clear (tdo=tdo@entry=0x7f1236c55840) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Modules/itertoolsmodule.c:827
827     teedataobject_safe_decref(tmp, state->teedataobject_type);

(gdb) py-bt
Traceback (most recent call first):
  Garbage-collecting

(gdb) bt
#0  0x00007f1247e6fa11 in teedataobject_clear (tdo=tdo@entry=0x7f1236c55840) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Modules/itertoolsmodule.c:827
#1  0x00007f1247e6f959 in teedataobject_dealloc (tdo=0x7f1236c55840) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Modules/itertoolsmodule.c:836
#2  0x00007f1247f037ce in Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Include/object.h:706
#3  tee_clear (to=to@entry=0x7f12362d3780) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Modules/itertoolsmodule.c:1041
#4  0x00007f1247f0376c in tee_dealloc (to=0x7f12362d3780) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Modules/itertoolsmodule.c:1050
#5  0x00007f1247df2c46 in dict_dealloc (mp=0x7f12362e8040) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Objects/dictobject.c:2367
#6  0x00007f1247e365ea in _Py_Dealloc (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': (30, 0), '_tok_end': (30, 0), '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': '/etc/beaker/client.conf', '_tokens': <itertools._tee at remote 0x7f12362d3780>}) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Objects/object.c:2608
#7  Py_DECREF (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': (30, 0), '_tok_end': (30, 0), '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': '/etc/beaker/client.conf', '_tokens': <itertools._tee at remote 0x7f12362d3780>}) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Include/object.h:706
#8  Py_XDECREF (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': (30, 0), '_tok_end': (30, 0), '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': '/etc/beaker/client.conf', '_tokens': <itertools._tee at remote 0x7f12362d3780>}) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Include/object.h:799
#9  Py_XDECREF (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': (30, 0), '_tok_end': (30, 0), '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': '/etc/beaker/client.conf', '_tokens': <itertools._tee at remote 0x7f12362d3780>}) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Include/object.h:796
#10 subtype_dealloc (
    self=<PyConfigParser(_tok_number=0, _tok_value='', _tok_begin=(30, 0), _tok_end=(30, 0), _tok_line='', _tok_name='ENDMARKER', _config_file_suffix='conf', _debug=False, _open_file='/etc/beaker/client.conf', _tokens=<itertools._tee at remote 0x7f12362d3780>) at remote 0x7f12365f9d10>) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Objects/typeobject.c:2020
#11 0x00007f1247ec00dd in Py_DECREF (op=<optimized out>) at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Include/object.h:706
#12 _PyObject_ClearManagedDict (
    obj=<BeakerProvider(poll_sleep=60, distros=<dict_values at remote 0x7f1237e649a0>, reserve_duration=86400, hub=<HubProxy(_conf=<PyConfigParser(_tok_number=0, _tok_value='', _tok_begin=(28, 0), _tok_end=(28, 0), _tok_line='', _tok_name='ENDMARKER', _config_file_suffix='conf', _debug=False, _open_file='/usr/lib/python3.12/site-packages/bkr/common/default.conf', _tokens=<itertools._tee at remote 0x7f12362eb780>) at remote 0x7f12362cf570>, _hub=<ServerProxy(_ServerProxy__host='REDACTED', _ServerProxy__handler='/client/', _ServerProxy__transport=<SafeCookieTransport(retry_count=5, retry_timeout=30, context=<SSLContext at remote 0x7f12366afa50>, timeout=120, proxy_config={'proxy': None, 'proxy_user': None, 'proxy_password': None}, no_proxy=[''], verbose=None, cookie_request=<Request(_full_url='REDACTED', fragment=None, type='https', host='REDACTED', selector='/', headers={}, unredirected_hdrs={'Cookie': 'beaker_auth_token=REDACTED...(truncated))
    at /usr/src/debug/python3.12-3.12.0-1.fc39.x86_64/Objects/dictobject.c:5611
--Type <RET> for more, q to quit, c to continue without paging--
lukaszachy commented 8 months ago

Sources: tmt - https://github.com/teemtee/tmt/blob/main/tmt/steps/provision/mrack.py mrack library - https://github.com/neoave/mrack/blob/main/src/mrack/ beaker client - https://github.com/beaker-project/beaker/tree/python-3/Client/src/bkr

lukaszachy commented 8 months ago

I've got advice to run with python3-debug, so...

$ /usr/bin/python3.12d -X dev -sP /usr/bin/tmt run prov -h beaker --image rhel-9999

python3.12d: /builddir/build/BUILD/Python-3.12.0/Include/cpython/tupleobject.h:23: PyTuple_GET_SIZE: Assertion `PyTuple_Check(op)' failed.
Fatal Python error: Aborted
lukaszachy commented 8 months ago

and gdb says (Note: different tb)

Core was generated by `python3.12d -X dev -sP /usr/bin/tmt run prov -h beaker --image rhel-9999'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;                                                                                                                                                                                                           
(gdb) py-bt
Traceback (most recent call first):
  <built-in method exec_dynamic of module object at remote 0x7f243a4ee150>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1296, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  <built-in method exec_dynamic of module object at remote 0x7f243a4ee150>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1296, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  <built-in method __import__ of module object at remote 0x7f243a4d64b0>
  <built-in method exec_dynamic of module object at remote 0x7f243a4ee150>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1296, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  <built-in method exec_dynamic of module object at remote 0x7f243a4ee150>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1296, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  <built-in method __import__ of module object at remote 0x7f243a4d64b0>
  <built-in method exec_dynamic of module object at remote 0x7f243a4ee150>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1296, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "/usr/lib64/python3.12/site-packages/gssapi/raw/__init__.py", line 50, in <module>
    from gssapi.raw.creds import *  # noqa
  <built-in method exec of module object at remote 0x7f243a4d64b0>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 994, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  <built-in method __import__ of module object at remote 0x7f243a4d64b0>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1304, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "/usr/lib64/python3.12/site-packages/gssapi/__init__.py", line 31, in <module>
    from gssapi.raw.types import NameType, RequirementFlag, AddressType  # noqa
  <built-in method exec of module object at remote 0x7f243a4d64b0>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 994, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "/usr/lib/python3.12/site-packages/bkr/common/hub.py", line 12, in <module>
    import gssapi
  <built-in method exec of module object at remote 0x7f243a4d64b0>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 994, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
--Type <RET> for more, q to quit, c to continue without paging--c
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "/usr/lib/python3.12/site-packages/bkr/client/command.py", line 15, in <module>
    from bkr.common.hub import HubProxy
  <built-in method exec of module object at remote 0x7f243a4d64b0>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 994, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "/usr/lib/python3.12/site-packages/bkr/client/__init__.py", line 20, in <module>
    from bkr.client.command import Command
  <built-in method exec of module object at remote 0x7f243a4d64b0>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 994, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "/usr/lib/python3.12/site-packages/mrack/providers/beaker.py", line 27, in <module>
    from bkr.client import BeakerJob, BeakerRecipe, BeakerRecipeSet
  <built-in method exec of module object at remote 0x7f243a4d64b0>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 994, in exec_module
  File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "/usr/lib/python3.12/site-packages/tmt/steps/provision/mrack.py", line 290, in import_and_load_mrack_deps
    from mrack.providers.beaker import PROVISIONER_KEY as BEAKER
  File "/usr/lib/python3.12/site-packages/tmt/steps/provision/mrack.py", line 694, in go
    import_and_load_mrack_deps(self.workdir, self.name, self._logger)
  File "/usr/lib/python3.12/site-packages/tmt/steps/provision/__init__.py", line 1895, in go
    phase.go()
  File "/usr/lib/python3.12/site-packages/tmt/base.py", line 2207, in go
    step.go()
  File "/usr/lib/python3.12/site-packages/tmt/base.py", line ?, in go
    (failed to get frame line number)
  File "/usr/lib/python3.12/site-packages/tmt/cli.py", line 393, in finito
    click_context.obj.run.go()
  File "/usr/lib/python3.12/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.12/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1626, in _process_result
    value = ctx.invoke(self._result_callback, value, **ctx.params)
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1689, in invoke
    return _process_result(rv)
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.12/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.12/site-packages/tmt/__main__.py", line 15, in run_cli
    tmt.cli.main()
  File "/usr/bin/tmt", line 8, in <module>
    sys.exit(run_cli())
lukaszachy commented 8 months ago

With @hroncok help investigation continued: The failed assert smelled like https://github.com/cython/cython/issues/5665

The python3-cython-3.0.5-1.fc40.x86_64 (present in fedora-rawhide) should fix that so I rpmbuild --rebuild ./python-gssapi-1.7.3-6.fc39.src.rpm and run the reproducer again.

Still fails but a bit different (with python3-gssapi from f-rawhide the traceback was same as on f-39).

# /usr/bin/python3.12d -X dev -sP /usr/bin/tmt run prov -h beaker --image rhel-9999
python3.12d: /builddir/build/BUILD/Python-3.12.0/Include/cpython/tupleobject.h:23: PyTuple_GET_SIZE: Assertion `PyTuple_Check(op)' failed.
Fatal Python error: Aborted

Current thread 0x00007fb9f7378740 (most recent call first):
  Garbage-collecting
  <no Python frame>
Aborted (core dumped)

And coredumpctl gdb;

Core was generated by `/usr/bin/python3.12d -X dev -sP /usr/bin/tmt run prov -h beaker --image rhel-99'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;                                                                        
(gdb) py-bt
Traceback (most recent call first):
  Garbage-collecting
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007fb9f68adbd3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007fb9f685b8ee in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#3  0x00007fb9f6e586c7 in faulthandler_fatal_error (signum=6) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Modules/faulthandler.c:330
#4  <signal handler called>
#5  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#6  0x00007fb9f68adbd3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#7  0x00007fb9f685b8ee in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#8  0x00007fb9f68438ff in __GI_abort () at abort.c:79
#9  0x00007fb9f684381b in __assert_fail_base (fmt=0x7fb9f69c3bb8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7fb9f6f139d8 "et->ht_module", 
    file=file@entry=0x7fb9f700f540 "/builddir/build/BUILD/Python-3.12.0/Include/internal/pycore_typeobject.h", line=line@entry=105, 
    function=function@entry=0x7fb9f6f86240 <__PRETTY_FUNCTION__.60.lto_priv.10> "_PyType_GetModuleState") at assert.c:94
#10 0x00007fb9f6853c67 in __assert_fail (assertion=0x7fb9f6f139d8 "et->ht_module", 
    file=0x7fb9f700f540 "/builddir/build/BUILD/Python-3.12.0/Include/internal/pycore_typeobject.h", line=105, 
    function=0x7fb9f6f86240 <__PRETTY_FUNCTION__.60.lto_priv.10> "_PyType_GetModuleState") at assert.c:103
#11 0x00007fb9f6eace2d in _PyType_GetModuleState (type=0x5566748bb2e0)
    at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Include/internal/pycore_typeobject.h:105
#12 0x00007fb9f6ead303 in get_module_state_by_cls (cls=0x5566748bb2e0) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Modules/itertoolsmodule.c:52
#13 0x00007fb9f6eb0e28 in teedataobject_clear (tdo=0x5566751a11a0) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Modules/itertoolsmodule.c:826
#14 0x00007fb9f6eb0e85 in teedataobject_dealloc (tdo=0x5566751a11a0) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Modules/itertoolsmodule.c:836
#15 0x00007fb9f6c7778b in _Py_Dealloc (op=<itertools._tee_dataobject at remote 0x5566751a11a0>)
    at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Objects/object.c:2625
#16 0x00007fb9f6eac90d in Py_DECREF (filename=0x7fb9f702ea70 "/builddir/build/BUILD/Python-3.12.0/Modules/itertoolsmodule.c", lineno=1041, 
    op=<itertools._tee_dataobject at remote 0x5566751a11a0>) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Include/object.h:691
#17 0x00007fb9f6eb15f3 in tee_clear (to=0x7fb9f338da30) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Modules/itertoolsmodule.c:1041
#18 0x00007fb9f6eb1632 in tee_dealloc (to=0x7fb9f338da30) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Modules/itertoolsmodule.c:1050
#19 0x00007fb9f6c7778b in _Py_Dealloc (op=<itertools._tee at remote 0x7fb9f338da30>) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Objects/object.c:2625
#20 0x00007fb9f6c50b78 in Py_DECREF (filename=0x7fb9f6ff4748 "/builddir/build/BUILD/Python-3.12.0/Include/object.h", lineno=799, 
    op=<itertools._tee at remote 0x7fb9f338da30>) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Include/object.h:691
#21 0x00007fb9f6c50bcd in Py_XDECREF (op=<itertools._tee at remote 0x7fb9f338da30>) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Include/object.h:799
#22 0x00007fb9f6c581b0 in dict_dealloc (mp=0x7fb9f368b590) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Objects/dictobject.c:2367
#23 0x00007fb9f6c7778b in _Py_Dealloc (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': <unknown at remote 0x7fb9f33985f0>, '_tok_end': <unknown at remote 0x7fb9f3398730>, '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': <unknown at remote 0x7fb9f3c54f40>, '_tokens': <itertools._tee at remote 0x7fb9f338da30>}) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Objects/object.c:2625
#24 0x00007fb9f6c8fd19 in Py_DECREF (filename=0x7fb9f6ff4748 "/builddir/build/BUILD/Python-3.12.0/Include/object.h", lineno=799, 
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': <unknown at remote 0x7fb9f33985f0>, '_tok_end': <unknown at remote 0x7fb9f3398730>, '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': <unknown at remote 0x7fb9f3c54f40>, '_tokens': <itertools._tee at remote 0x7fb9f338da30>}) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Include/object.h:691
#25 0x00007fb9f6c8fd6e in Py_XDECREF (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': <unknown at remote 0x7fb9f33985f0>, '_tok_end': <unknown at remote 0x7fb9f3398730>, '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': <unknown at remote 0x7fb9f3c54f40>, '_tokens': <itertools._tee at remote 0x7fb9f338da30>}) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Include/object.h:799
#26 0x00007fb9f6c96668 in subtype_dealloc (
    self=<PyConfigParser(_tok_number=0, _tok_value='', _tok_begin=<unknown at remote 0x7fb9f33985f0>, _tok_end=<unknown at remote 0x7fb9f3398730>, _tok_line='', _tok_name='ENDMARKER', _config_file_suffix='conf', _debug=False, _open_file=<unknown at remote 0x7fb9f3c54f40>, _tokens=<itertools._tee at remote 0x7fb9f338da30>) at remote 0x7fb9f3388a00>) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Objects/typeobject.c:2020
#27 0x00007fb9f6c7778b in _Py_Dealloc (
    op=<PyConfigParser(_tok_number=0, _tok_value='', _tok_begin=<unknown at remote 0x7fb9f33985f0>, _tok_end=<unknown at remote 0x7fb9f3398730>, _tok_line='', _tok_name='ENDMARKER', _config_file_suffix='conf', _debug=False, _open_file=<unknown at remote 0x7fb9f3c54f40>, _tokens=<itertools._tee at remote 0x7fb9f338da30>) at remote 0x7fb9f3388a00>) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Objects/object.c:2625
#28 0x00007fb9f6c50b78 in Py_DECREF (filename=0x7fb9f700d650 "/builddir/build/BUILD/Python-3.12.0/Objects/dictobject.c", lineno=5611, 
    op=<PyConfigParser(_tok_number=0, _tok_value='', _tok_begin=<unknown at remote 0x7fb9f33985f0>, _tok_end=<unknown at remote 0x7fb9f3398730>, _tok_line='', _tok_name='ENDMARKER', _config_file_suffix='conf', _debug=False, _open_file=<unknown at remote 0x7fb9f3c54f40>, _tokens=<itertools._tee at remote 0x7fb9f338da30>) at remote 0x7fb9f3388a00>) at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Include/object.h:691
#29 0x00007fb9f6c5f840 in _PyObject_ClearManagedDict (
    obj=<BeakerProvider(poll_sleep=60, distros=<dict_values at remote 0x7fb9f3682350>, reserve_duration=86400, hub=<HubProxy(_conf=<PyConfigParser(_tok_number=0, _tok_value='', _tok_begin=(28, 0), _tok_end=(28, 0), _tok_line='', _tok_name='ENDMARKER', _config_file_suffix='conf', _debug=False, _open_file='/usr/lib/python3.12/site-packages/bkr/common/default.conf', _tokens=<itertools._tee at remote 0x7fb9f338e7b0>) at remote 0x7fb9f3389020>, _hub=<ServerProxy(_ServerProxy__host='REDACTED', _ServerProxy__handler='/client/', _ServerProxy__transport=<SafeCookieTransport(retry_count=5, retry_timeout=30, context=None, timeout=120, proxy_config={'proxy': None, 'proxy_user': None, 'proxy_password': None}, no_proxy=[''], verbose=None, cookie_request=<Request(_full_url='REDACTED', fragment=None, type='https', host='REDACTED', selector='/', headers={}, unredirected_hdrs={'Cookie': 'beaker_auth_token=REDACTED...(truncated))
    at /usr/src/debug/python3.12-3.12.0-2.fc40.x86_64/Objects/dictobject.c:5611
#30 0x00007fb9f6c960a1 in subtype_clear (
    self=<BeakerProvider(poll_sleep=60, distros=<dict_values at remote 0x7fb9f3682350>, reserve_duration=86400, hub=<HubProxy(_conf=<PyConfigParser(_tok_number=0, _tok_value='', _tok_begin=(28, 0), _tok_end=(28, 0), _tok_line='', _tok_name='ENDMARKER', _config_file_suffix='conf', _debug=False, _open_file='/usr/lib/python3.12/site-packages/bkr/common/default.conf', _tokens=<itertools._tee at remote 0x7fb9f338e7b0>) at remote 0x7fb9f3389020>, _hub=<ServerProxy(_ServerProxy__host='REDACTED', _ServerProxy__handler='/client/', _ServerProxy__transport=<SafeCookieTransport(retry_count=5, retry_timeout=30, context=N--Type <RET> for more, q to quit, c to continue without paging--
lukaszachy commented 8 months ago

Related Fedora update PR - https://src.fedoraproject.org/rpms/python-gssapi/pull-request/7

lukaszachy commented 8 months ago

@dav-pascual, Would you mind trying mrack on Fedora-39/rawhide to see if this is a problem in the mrack itself?

lukaszachy commented 8 months ago

I've updated the title as segfault happens at the very end even for successful run.

    finish
        summary: 0 tasks completed
total: 1 test passed
Segmentation fault (core dumped)
lukaszachy commented 8 months ago

Same error as when an image is invalid, so we can play with much faster reproducer in the description.

$ python3.12d -X dev -sP /usr/bin/tmt run prov -h beaker --image fedora-37
....
Executing <Task finished name='Task-67' coro=<BeakerAPI.inspect() done, defined at /usr/lib/python3.12/site-packages/tmt/steps/provision/mrack.py:494> result={'id': '8583010', 'result': 'Pass', 'rid': '15038024', 'status': 'Reserved', ...} created at /usr/lib64/python3.12/asyncio/runners.py:100> took 0.552 seconds
        summary: 1 guest provisioned
python3.12d: /builddir/build/BUILD/Python-3.12.0/Include/internal/pycore_typeobject.h:105: _PyType_GetModuleState: Assertion `et->ht_module' failed.
Fatal Python error: Aborted
dav-pascual commented 8 months ago

@lukaszachy Tested successfully to provision and run a dummy test on fedora-rawhide via mrack (I used a different tool, not tmt though)

lukaszachy commented 8 months ago

Thank you David. So the problem is in the way tmt does things with mrack :(... I kind of hoped to move the hot potato your way :D

dav-pascual commented 8 months ago

I'm not familiar with tmt yet, but it seems that downstream tool to test IdM projects is not affected. Not sure either what version of mrack this is using, but I doubt that's the problem :slightly_smiling_face:

dav-pascual commented 8 months ago

You can always try mrack directly https://github.com/neoave/mrack It can be installed via pip and needs the test working dir prepared (tmt does that before calling the tool)

lukaszachy commented 6 months ago

@psss Any idea how urgent this is? Eg how many users do we have for tmt + mrack on fedora 39+

psss commented 6 months ago

Probably not that urgent, I haven't heard anyone else to complain. But, still, it would be good to get this fixed rather soon I'd say. It's an annoying problem. @dav-pascual, wouldn't you have some time to investigate? The reproducer is quite simple:

dnf install -y tmt+provision-beaker
git clone https://github.com/teemtee/tmt
cd tmt
tmt --context how=beaker run -vvv plan --name beaker/sanity/basic
dav-pascual commented 6 months ago

@psss I'll try to make time to look at this

psss commented 6 months ago

Great, thank you!

lukaszachy commented 5 months ago

@psss even simpler reproducer (see description). The segfault happens during cleanup in finish so there is no need to run anything in beaker at all. Simple query for image results with coredump: tmt run prov -h beaker --image rhel-9999

psss commented 5 months ago

Let's try to investigate/fix that in 1.32.

lukaszachy commented 5 months ago

Another finding (but I don't know what to do next):

Since the itertools.tee appeared in the gdb:

ore was generated by `/usr/bin/python3 -sP /usr/bin/tmt run prov -h beaker --image rhel-9999'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb9b0c72e01 in teedataobject_clear (tdo=tdo@entry=0x7fb91b263840)
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Modules/itertoolsmodule.c:836
836     teedataobject_safe_decref(tmp, state->teedataobject_type);
(gdb) py-bt
Traceback (most recent call first):
  Garbage-collecting
(gdb) bt
#0  0x00007fb9b0c72e01 in teedataobject_clear (tdo=tdo@entry=0x7fb91b263840)
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Modules/itertoolsmodule.c:836
#1  0x00007fb9b0c72d49 in teedataobject_dealloc (tdo=0x7fb91b263840)
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Modules/itertoolsmodule.c:845
#2  0x00007fb9b0d06a0e in Py_DECREF (op=<optimized out>)
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Include/object.h:706
#3  tee_clear (to=to@entry=0x7fb91aa580c0)
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Modules/itertoolsmodule.c:1050
#4  0x00007fb9b0d069ac in tee_dealloc (to=0x7fb91aa580c0)
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Modules/itertoolsmodule.c:1059
#5  0x00007fb9b0bf5336 in dict_dealloc (mp=0x7fb91aa4ea80)
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Objects/dictobject.c:2367
#6  0x00007fb9b0c391f2 in _Py_Dealloc (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': (26, 0), '_tok_end': (26, 0), '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': '/etc/beaker/client.conf', '_tokens': <itertools._tee at remote 0x7fb91aa580c0>})
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Objects/object.c:2608
#7  Py_DECREF (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': (26, 0), '_tok_end': (26, 0), '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': '/etc/beaker/client.conf', '_tokens': <itertools._tee at remote 0x7fb91aa580c0>})
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Include/object.h:706
#8  Py_XDECREF (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': (26, 0), '_tok_end': (26, 0), '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': '/etc/beaker/client.conf', '_tokens': <itertools._tee at remote 0x7fb91aa580c0>})
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Include/object.h:799
#9  Py_XDECREF (
    op={'_tok_number': 0, '_tok_value': '', '_tok_begin': (26, 0), '_tok_end': (26, 0), '_tok_line': '', '_tok_name': 'ENDMARKER', '_config_file_suffix': 'conf', '_debug': False, '_open_file': '/etc/beaker/client.conf', '_tokens': <itertools._tee at remote 0x7fb91aa580c0>})
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Include/object.h:796
#10 subtype_dealloc (
    self=<PyConfigParser(_tok_number=0, _tok_value='', _tok_begin=(26, 0), _tok_end=(26, 0), _tok_line='', _tok_name='ENDMARKER', _config_file_suffix='conf', _debug=False, _open_file='/etc/beaker/client.conf', _tokens=<itertools._tee at remote 0x7fb91aa580c0>) at remote 0x7fb91c4b5900>)
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Objects/typeobject.c:2020
#11 0x00007fb9b0cc33fd in Py_DECREF (op=<optimized out>)
    at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Include/object.h:706
#12 _PyObject_ClearManagedDict (
    obj=<BeakerProvider(poll_sleep=60, distros=<dict_values at remote 0x7fb91b221f30>, reserve_duration=86400, hub=<HubProxy(_conf=<PyConfigParser(_tok_number=0, _tok_value='', _tok_begin=(28, 0), _tok_end=(28, 0), _tok_line='', _tok_name='ENDMARKER', _config_file_suffix='conf', _debug=False, _open_file='/usr/lib/python3.12/site-packages/bkr/common/default.conf', _tokens=<itertools._tee at remote 0x7fb91b0d7c40>) at remote 0x7fb91b2b22b0>, _hub=<ServerProxy(_ServerProxy__host='beaker.VALUE', _ServerProxy__handler='/client/', _ServerProxy__transport=<SafeCookieTransport(retry_count=5, retry_timeout=30, context=None, timeout=120, proxy_config={'proxy': None, 'proxy_user': None, 'proxy_password': None}, no_proxy=[''], verbose=None, cookie_request=<Request(_full_url='https://beaker.engineering.redhat.com/', fragment=None, type='https', host='beaker.VALUE', selector='/', headers={}, unredirected_hdrs={'Cookie': 'beaker_auth_token=...(truncated)) at /usr/src/debug/python3.12-3.12.1-2.fc39.x86_64/Objects/dictobject.c:5611

I've tracked it down to the https://github.com/beaker-project/beaker/blob/3781e40ef660f4666fbe9507c15198d3dedb61b3/Common/bkr/common/pyconfig.py#L220

and once I changed that to

    def _get_STRING(self):
        """
        Return a STRING token value.
        """
        # remove apostrophes or quotation marks
        result = self._tok_value[1:-1]
        return result

There is no segmentaion fault for the reproducer (I don't use any %s in config, so functional-wise no difference for me

lukaszachy commented 5 months ago

In the docs is mentioned that "Once a tee() has been created, the original iterable should not be used anywhere else;" and such advise isn't followed here

thrix commented 5 months ago

Next steps: ask for help of some Python maintainers

lukaszachy commented 5 months ago

In the case it helps: 'segfault' does not happen when beaker-client is used directly (bkr command...). 'mrack' comand itself doesn't crash neither.
We have to do something incompatible in way how "tmt->mrack->beaker-client' is used.

lukaszachy commented 5 months ago

Can be reproduced in fedora container without valid kerberos credentials as well:

$ podman run --rm -ti fedora
[root@18ffcdc13e61 /]#  dnf install -y tmt-provision-beaker
[root@18ffcdc13e61 /]# cat > /etc/beaker/client.conf
HUB_URL = XXXX
AUTH_METHOD = "krbv"
KRB_REALM = YYY
[root@18ffcdc13e61 /]# tmt run prov -h beaker --image rhel-9999
<redacted>
                Major (458752): No credentials were supplied, or the credentials were unavailable or inaccessible, Minor (2529639053): No Kerberos credentials available: Function not implemented
Segmentation fault (core dumped)
hroncok commented 5 months ago
[root@11adbdcc32a3 /]# cat /etc/beaker/client.conf
HUB_URL = XXXX
AUTH_METHOD = "krbv"
KRB_REALM = YYY

root@11adbdcc32a3 /]# tmt run prov -h beaker --image rhel-9999
/var/tmp/tmt/run-001

/default/plan
    provision
        queued provision.provision task #1: default-0

        provision.provision task #1: default-0
        how: beaker
        image: rhel-9999
        fail: 'XXXX'

plan failed

The exception was caused by 1 earlier exceptions

Cause number 1:

    provision step failed

    The exception was caused by 1 earlier exceptions

    Cause number 1:

        'XXXX'

[root@11adbdcc32a3 /]# rpm -q python3
python3-3.12.0-1.fc39.x86_64

[root@11adbdcc32a3 /]# dnf update
[root@11adbdcc32a3 /]# rpm -q python3
python3-3.12.1-2.fc39.x86_64

[root@11adbdcc32a3 /]# dnf --enablerepo=updates-testing update
[root@11adbdcc32a3 /]# rpm -q python3
python3-3.12.2-1.fc39.x86_64

No crash with either version.

hroncok commented 5 months ago

But setting a hub URL to a real (internal) value makes it crash.

vstinner commented 3 months ago

I can reproduce the issue with https://github.com/teemtee/tmt/issues/2501#issuecomment-1954532963 reproducer.

[root@e67b66cee0cb /]# dnf install -y python3-debug
[root@e67b66cee0cb /]# cd
[root@e67b66cee0cb ~]# curl -O https://raw.githubusercontent.com/python/cpython/3.12/Tools/gdb/libpython.py
[root@e67b66cee0cb ~]# export DEBUGINFOD_URLS=https://debuginfod.fedoraproject.org/
[root@e67b66cee0cb ~]# gdb -args python3-debug /usr/bin/tmt run prov -h beaker --image rhel-9999

(gdb) set debuginfod enabled on
(gdb) source libpython.py
(gdb) run

(...)

python3-debug: /builddir/build/BUILD/Python-3.12.2/Include/cpython/tupleobject.h:23: PyTuple_GET_SIZE: Assertion `PyTuple_Check(op)' failed.

(gdb) where

(gdb) where
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007f22215288a3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007f22214d68ee in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f22214be8ff in __GI_abort () at abort.c:79
#4  0x00007f22214be81b in __assert_fail_base (fmt=0x7f222163daf8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7f2221b80176 "PyTuple_Check(op)", file=file@entry=0x7f2221c844f8 "/builddir/build/BUILD/Python-3.12.2/Include/cpython/tupleobject.h", 
    line=line@entry=23, function=function@entry=0x7f2221b9deb0 <__PRETTY_FUNCTION__.12.lto_priv.7> "PyTuple_GET_SIZE") at assert.c:92
#5  0x00007f22214cec57 in __assert_fail (assertion=0x7f2221b80176 "PyTuple_Check(op)", 
    file=0x7f2221c844f8 "/builddir/build/BUILD/Python-3.12.2/Include/cpython/tupleobject.h", line=23, 
    function=0x7f2221b9deb0 <__PRETTY_FUNCTION__.12.lto_priv.7> "PyTuple_GET_SIZE") at assert.c:101
#6  0x00007f22218780d2 in PyTuple_GET_SIZE (op={}) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Include/cpython/tupleobject.h:23
#7  0x00007f2221878d1d in method_vectorcall (method=<method at remote 0x7f221ce90410>, args=0x7f221d9d85a8, nargsf=2, kwnames={})
    at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Objects/classobject.c:66
#8  0x00007f221ccb44b7 in __Pyx_Py3MetaclassPrepare (metaclass=metaclass@entry=<type at remote 0x7f22100df720>, 
    bases=bases@entry=(<EnumType(_generate_next_value_=<staticmethod at remote 0x7f222103b890>, __module__='enum', __doc__='\n    Enum where members are also (and must be) ints\n    ', _new_member_=<built-in method __new__ of type object at remote 0x7f2221dced00>, _use_args_=True, _member_names_=[], _member_map_={}, _value2member_map_={}, _unhashable_values_=[], _member_type_=<type at remote 0x7f2221dced00>, _value_repr_=<wrapper_descriptor at remote 0x7f222129ed50>, __dict__=<getset_descriptor at remote 0x7f2221054890>, __format__=<method_descriptor at remote 0x7f22212a01d0>, __str__=<wrapper_descriptor at remote 0x7f222129ed50>, __repr__=<function at remote 0x7f222104ec90>, __new__=<function at remote 0x7f222104e990>) at remote 0x55d5e80d9030>,), name='RequirementFlag', qualname='RequirementFlag', mkw=mkw@entry={}, 
    modname='gssapi.raw.types', doc=0x0) at gssapi/raw/types.c:13392
#9  0x00007f221ccb58d5 in __pyx_pymod_exec_types (__pyx_pyinit_module=<optimized out>) at gssapi/raw/types.c:10426

(gdb) py-bt
Traceback (most recent call first):
  <built-in method exec_dynamic of module object at remote 0x7f222130a390>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1297, in exec_module
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  <built-in method exec_dynamic of module object at remote 0x7f222130a390>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1297, in exec_module
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  <built-in method __import__ of module object at remote 0x7f22212f26f0>
  <built-in method exec_dynamic of module object at remote 0x7f222130a390>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1297, in exec_module
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  <built-in method exec_dynamic of module object at remote 0x7f222130a390>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1297, in exec_module
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  <built-in method __import__ of module object at remote 0x7f22212f26f0>
  <built-in method exec_dynamic of module object at remote 0x7f222130a390>
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1297, in exec_module
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "/usr/lib64/python3.12/site-packages/gssapi/raw/__init__.py", line 50, in <module>
    from gssapi.raw.creds import *  # noqa
  (...)

(gdb) frame 6
#6  0x00007fb1531fa0d2 in PyTuple_GET_SIZE (op={}) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Include/cpython/tupleobject.h:23
23      PyTupleObject *tuple = _PyTuple_CAST(op);

(gdb) frame 7
#7  0x00007fb1531fad1d in method_vectorcall (method=<method at remote 0x7fb14e8b44d0>, args=0x7fb14fdd85a8, nargsf=2, kwnames={})
    at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Objects/classobject.c:66
(gdb) list
61          result = _PyObject_VectorcallTstate(tstate, func, newargs,
62                                              nargs, kwnames);
63          newargs[0] = tmp;
64      }
65      else {
66          Py_ssize_t nkwargs = (kwnames == NULL) ? 0 : PyTuple_GET_SIZE(kwnames); <== HERE
67          Py_ssize_t totalargs = nargs + nkwargs;
68          if (totalargs == 0) {
69              return _PyObject_VectorcallTstate(tstate, func, &self, 1, NULL);
70          }

(gdb) frame 8
#8  0x00007fb14e63b4b7 in __Pyx_Py3MetaclassPrepare (metaclass=metaclass@entry=<type at remote 0x7fb1480df6f0>, 
    bases=bases@entry=(<EnumType(_generate_next_value_=<staticmethod at remote 0x7fb152b2a1c0>, __module__='enum', __doc__='\n    Enum where members are also (and must be) ints\n    ', _new_member_=<built-in method __new__ of type object at remote 0x7fb153750d00>, _use_args_=True, _member_names_=[], _member_map_={}, _value2member_map_={}, _unhashable_values_=[], _member_type_=<type at remote 0x7fb153750d00>, _value_repr_=<wrapper_descriptor at remote 0x7fb152c22d50>, __dict__=<getset_descriptor at remote 0x7fb1529d4890>, __format__=<method_descriptor at remote 0x7fb152c241d0>, __str__=<wrapper_descriptor at remote 0x7fb152c22d50>, __repr__=<function at remote 0x7fb1529cec90>, __new__=<function at remote 0x7fb1529ce990>) at remote 0x5563182d0030>,), name='RequirementFlag', qualname='RequirementFlag', mkw=mkw@entry={}, 
    modname='gssapi.raw.types', doc=0x0) at gssapi/raw/types.c:13392
(gdb) list
13387       PyObject *ns;
13388       if (metaclass) {
13389           PyObject *prep = __Pyx_PyObject_GetAttrStrNoError(metaclass, __pyx_n_s_prepare);
13390           if (prep) {
13391               PyObject *pargs[3] = {NULL, name, bases};
13392               ns = __Pyx_PyObject_FastCallDict(prep, pargs+1, 2 | __Pyx_PY_VECTORCALL_ARGUMENTS_OFFSET, mkw); <==== HERE ===
13393               Py_DECREF(prep);
13394           } else {
13395               if (unlikely(PyErr_Occurred()))
13396                   return NULL;

The bug occurs on from gssapi.raw.creds import * # noqa: when the C extension gssapi is imported, Cython creates types in the module exec function (__pyx_pymod_exec_types()).

Problem: __Pyx_Py3MetaclassPrepare() pass a dictionary as kwnames to method_vectorcall(), whereas kwnames must be a tuple. Later, an assertion fails since we get the wrong type.

__Pyx_Py3MetaclassPrepare() pass its mkw argument to method_vectorcall() as kwnames:

__pyx_pymod_exec_types() creates it as a dictionary with:

__pyx_t_4 = __Pyx_PyDict_NewPresized(0);
vstinner commented 3 months ago

I can reproduce the bug just with:

[root@e67b66cee0cb ~]# python3-debug -c 'import gssapi.raw.types'
python3-debug: /builddir/build/BUILD/Python-3.12.2/Include/cpython/tupleobject.h:23: PyTuple_GET_SIZE: Assertion `PyTuple_Check(op)' failed.
Aborted (core dumped)
hroncok commented 3 months ago

That is a different problem, which only makes debugging the problem at hand harder.

hroncok commented 3 months ago

A Fedora 39 scratch build of python-gssapi with fixed Cython: https://koji.fedoraproject.org/koji/taskinfo?taskID=115466028 (running)

vstinner commented 3 months ago

Steps to reproduce the issue:

curl -O https://kojipkgs.fedoraproject.org//work/tasks/6058/115466058/python3-gssapi-1.7.3-8.fc39.x86_64.rpm
curl -O https://raw.githubusercontent.com/python/cpython/3.12/Tools/gdb/libpython.py

dnf install -y python3-debug gdb tmt-provision-beaker

rpm -U python3-gssapi-1.7.3-8.fc39.x86_64.rpm

cat > /etc/beaker/client.conf <<EOF
HUB_URL = "https://beaker.engineering.redhat.com"
AUTH_METHOD = "krbv"
KRB_REALM = "IPA.REDHAT.COM"
EOF

echo 'set -e -x' > bug.sh
echo 'python3-debug -X dev /usr/bin/tmt run prov -h beaker --image rhel-9999' >>bug.sh
chmod +x ./bug.sh

echo "Now run: ./bug.sh"

The script upgrades gssapi to avoid the Cython issue described in https://github.com/teemtee/tmt/issues/2501#issuecomment-2020630154

vstinner commented 3 months ago

The problem occurs at Python exit:


I modified bug.sh to use gdb:

set -e -x
export DEBUGINFOD_URLS=https://debuginfod.fedoraproject.org/
gdb -x gdb-cmds -args python3-debug -X dev /usr/bin/tmt run prov -h beaker --image rhel-9999

using file gdb-cmds:

source libpython.py
set debuginfod enabled on
run

Reproduce the issue:

[root@8668f51f18e2 ~]# ./bug.sh 
(...)
plan failed
The exception was caused by 1 earlier exceptions
Cause number 1:
    (...)
        Beaker needs Kerberos ticket to authenticate to BeakerHub. Run 'kinit $USER' command to obtain Kerberos credentials.
        (...)
                Major (458752): No credentials were supplied, or the credentials were unavailable or inaccessible, Minor (2529639053): No Kerberos credentials available: Function not implemented

python3-debug: /builddir/build/BUILD/Python-3.12.2/Include/internal/pycore_typeobject.h:105: _PyType_GetModuleState: Assertion `et->ht_module' failed.

Thread 1 "python3-debug" received signal SIGABRT, Aborted.
(...)

(gdb) frame 6
#6  0x00007f48d5add273 in _PyType_GetModuleState (type=0x55aa057b6360)
    at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Include/internal/pycore_typeobject.h:105
105         assert(et->ht_module);                                                                                                                          

(gdb) p et->ht_module
$1 = 0x0

(gdb) p et->ht_type->tp_name
$2 = 0x7f48d514ee80 "itertools._tee_dataobject"

So the problem is that at some point, PyHeapType.ht_module member of the stdlib itertools._tee_dataobject type is set to NULL.

I put a hardware watchpoint on the this member:

(gdb) b itertoolsmodule_exec
(gdb) run
Start it from the beginning? (y or n) y

Breakpoint 1, itertoolsmodule_exec (mod=<module at remote 0x7f45c3cc4230>) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Modules/itertoolsmodule.c:4717
4717        itertools_state *state = get_module_state(mod);
(gdb) n

<... execute "next" until the end of the function ...>

(gdb) p state->teedataobject_type
$4 = (PyTypeObject *) 0x565110d8af30
(gdb) p ((PyHeapTypeObject*)state->teedataobject_type)->ht_module
$5 = <module at remote 0x7f45c3cc4230>
(gdb) p ((PyHeapTypeObject*)0x565110d8af30)->ht_module
$7 = <module at remote 0x7f45c3cc4230>

(gdb) watch ((PyHeapTypeObject*)0x565110d8af30)->ht_module
Hardware watchpoint 2: ((PyHeapTypeObject*)0x565110d8af30)->ht_module

(gdb) continue

(...)

Thread 1 "python3-debug" hit Hardware watchpoint 2: ((PyHeapTypeObject*)0x565110d8af30)->ht_module

Old value = <module at remote 0x7f45c3cc4230>
New value = 0x0
0x00007f45c43ef02d in type_clear (type=0x565110d8af30) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Objects/typeobject.c:5297
Downloading source file /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Objects/typeobject.c
5297        Py_CLEAR(((PyHeapTypeObject *)type)->ht_module);      

(gdb) where
#0  0x00007f45c43ef02d in type_clear (type=0x565110d8af30) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Objects/typeobject.c:5297
#1  0x00007f45c45a7391 in delete_garbage (tstate=0x7f45c49eda18 <_PyRuntime+475672>, gcstate=0x7f45c4990158 <_PyRuntime+92504>, 
    collectable=0x7fff40ec98c0, old=0x7f45c49901a0 <_PyRuntime+92576>) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Modules/gcmodule.c:1029

(...)

#5  0x00007f45c455f77d in Py_FinalizeEx () at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Python/pylifecycle.c:1889
#6  0x00007f45c4561e29 in Py_Exit (sts=2) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Python/pylifecycle.c:3056
(...)
#9  0x00007f45c456c200 in PyErr_PrintEx (set_sys_last_vars=1) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Python/pythonrun.c:845
(...)
#18 0x00007f45c45a4c1d in Py_BytesMain (argc=10, argv=0x7fff40ec9f58) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Modules/main.c:763
#19 0x0000565110a9517d in main (argc=10, argv=0x7fff40ec9f58) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Programs/python.c:15

(gdb) continue
Continuing.

python3-debug: /builddir/build/BUILD/Python-3.12.2/Include/internal/pycore_typeobject.h:105: _PyType_GetModuleState: Assertion `et->ht_module' failed.

Thread 1 "python3-debug" received signal SIGABRT, Aborted.

(gdb) where
(...)
#5  0x00007f45c3fa3c57 in __assert_fail (assertion=0x7f45c4667b87 "et->ht_module", 
    file=0x7f45c4764400 "/builddir/build/BUILD/Python-3.12.2/Include/internal/pycore_typeobject.h", line=105, 
    function=0x7f45c46da660 <__PRETTY_FUNCTION__.60.lto_priv.10> "_PyType_GetModuleState") at assert.c:101

#6  0x00007f45c4600273 in _PyType_GetModuleState (type=0x565110d8b360)
    at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Include/internal/pycore_typeobject.h:105
#7  0x00007f45c4600749 in get_module_state_by_cls (cls=0x565110d8b360) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Modules/itertoolsmodule.c:52
#8  0x00007f45c4604345 in teedataobject_clear (tdo=0x5651117a1390) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Modules/itertoolsmodule.c:835
#9  0x00007f45c46043a2 in teedataobject_dealloc (tdo=0x5651117a1390) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Modules/itertoolsmodule.c:845

#10 0x00007f45c43c8260 in _Py_Dealloc (op=<itertools._tee_dataobject at remote 0x5651117a1390>)
    at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Objects/object.c:2625
#11 0x00007f45c45ffd53 in Py_DECREF (filename=0x7f45c4783bd8 "/builddir/build/BUILD/Python-3.12.2/Modules/itertoolsmodule.c", lineno=1050, 
    op=<itertools._tee_dataobject at remote 0x5651117a1390>) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Include/object.h:691
#12 0x00007f45c4604b10 in tee_clear (to=0x7f45bb438d70) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Modules/itertoolsmodule.c:1050
#13 0x00007f45c4604b4f in tee_dealloc (to=0x7f45bb438d70) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Modules/itertoolsmodule.c:1059
#14 0x00007f45c43c8260 in _Py_Dealloc (op=<itertools._tee at remote 0x7f45bb438d70>)
    at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Objects/object.c:2625
#15 0x00007f45c43a1628 in Py_DECREF (filename=0x7f45c4749248 "/builddir/build/BUILD/Python-3.12.2/Include/object.h", lineno=799, 
    op=<itertools._tee at remote 0x7f45bb438d70>) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Include/object.h:691
#16 0x00007f45c43a167d in Py_XDECREF (op=<itertools._tee at remote 0x7f45bb438d70>) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Include/object.h:799
#17 0x00007f45c43a8c72 in dict_dealloc (mp=0x7f45bb3c4a10) at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Objects/dictobject.c:2367
(...)
#49 0x00007f45c455f77d in Py_FinalizeEx () at /usr/src/debug/python3.12-3.12.2-2.fc39.x86_64/Python/pylifecycle.c:1889
(...)
vstinner commented 3 months ago

The itertools._tee_dataobject object causing the crash is coming from PyConfigParser of bkr/common/pyconfig.py. Raw traceback read by tracemalloc:

(('/usr/lib/python3.12/site-packages/bkr/common/pyconfig.py', 220), # most recent
 ('/usr/lib/python3.12/site-packages/bkr/common/pyconfig.py', 259),
 ('/usr/lib/python3.12/site-packages/bkr/common/pyconfig.py', 162),
 ('/usr/lib/python3.12/site-packages/bkr/common/pyconfig.py', 123),
 ('/usr/lib/python3.12/site-packages/bkr/common/pyconfig.py', 115),
 ('/usr/lib/python3.12/site-packages/mrack/providers/beaker.py', 121),
 ('/usr/lib/python3.12/site-packages/mrack/transformers/beaker.py', 39),
 ('/usr/lib/python3.12/site-packages/mrack/transformers/transformer.py', 98),
 ('/usr/lib/python3.12/site-packages/tmt/steps/provision/mrack.py', 501),
 ('/usr/lib64/python3.12/asyncio/events.py', 88))

The problem is that the itertools._tee_dataobject type and a itertools._tee_dataobject instance are deleted in the same garbage collection, but the type is cleared before its instance.

vstinner commented 3 months ago

One way to work around the issue would be to change how Beaker manages tee() objects in PyConfigParser in bkr/common/pyconfig.py.

Another work around would be to copy the iterator as a list, rather than using tee().


In short, the Python crash occurs in 3 steps:

  1. itertoolsmodule_clear() is called: itertools_state is cleared, it stores strong references to itertools types
  2. type_clear() is called on itertools._tee_dataobject type => set ht_module to NULL.
  3. teedataobject_clear() is called on an itertools._tee_dataobject instance => assertion fails since ht_module is NULL.

The 3 steps occur during Python finalization, in finalize_modules(), step (2) and (3) occur during the same Garbage Collection.

The Python finalization does not guarantee that instances are cleared after types, types are objects, and objects are deallocated in a random order by the garbage collection.

vstinner commented 3 months ago

This issue is a duplicate of https://github.com/python/cpython/issues/115874 which has been fixed by https://github.com/python/cpython/commit/1c0c6c91afcb765f30bbb4e3d80a50e31d7adfcc in the Python 3.12 dev branch... last week :-)

hroncok commented 3 months ago

Fedora 39 scratchbuild with the commit applied as a patch: https://koji.fedoraproject.org/koji/taskinfo?taskID=115502997

vstinner commented 3 months ago

I confirm that the scratch build does fix the crash!

Fedora 39 scratchbuild with the commit applied as a patch: https://koji.fedoraproject.org/koji/taskinfo?taskID=115502997

With the current Python version, I can reproduce the crash.

With this scratch build, I can no longer reproduce the crash.

psss commented 3 months ago

Nice! @vstinner, @hroncok, thanks a lot for the investigation!

lukaszachy commented 3 months ago

Thank you very much for the help @vstinner and @hroncok . I've tried the scratch build and reproducer no longer segfaults and the proper usage (tmt provision finish) exits correctly as well.

# tmt run prov -h beaker  finish
/var/tmp/tmt/run-010

/default/plan
    provision
        queued provision.provision task #1: default-0

        provision.provision task #1: default-0
        how: beaker
        guest: has been requested
        job id: 9080030
        status: Reserved  

        summary: 1 guest provisioned
    finish

        summary: 0 tasks completed

Lets wait on Fedora to have updated Python in the repos and then we can close the issue. I'd leave this open in the case more people start using f-39 and newer in the mean time.

lukaszachy commented 3 months ago

Note: gssapi can stay broken - just updated python3.12 is necessary to stop segfaults (outside of python3-debug, of course).

psss commented 2 months ago

Issue reproduced with python3-3.12.2-2.fc39.x86_64 and verified that no Segmentation fault happens with python3-3.12.3-2.fc39.x86_64 which is now waiting in updates testing.

psss commented 2 months ago

No action needed on our side, will get resolved once the bodhi update makes it into the stable repo.