python / cpython

The Python programming language
https://www.python.org
Other
62.89k stars 30.13k forks source link

AIX: test_importlib regression (ENV change) #87831

Closed aixtools closed 3 years ago

aixtools commented 3 years ago
BPO 43665
Nosy @vstinner, @aixtools
Superseder
  • bpo-40092: Crash in _PyThreadState_DeleteExcept() at fork in the process child
  • Files
  • OpenPGP_0x722BFDB61F396FC2.asc
  • OpenPGP_0x722BFDB61F396FC2.asc
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['tests', '3.9', '3.10'] title = 'AIX: test_importlib regression (ENV change)' updated_at = user = 'https://github.com/aixtools' ``` bugs.python.org fields: ```python activity = actor = 'David.Edelsohn' assignee = 'none' closed = True closed_date = closer = 'vstinner' components = ['Tests'] creation = creator = 'Michael.Felt' dependencies = [] files = ['49922', '49923'] hgrepos = [] issue_num = 43665 keywords = [] message_count = 6.0 messages = ['389797', '389798', '389911', '389921', '389923', '389936'] nosy_count = 3.0 nosy_names = ['vstinner', 'David.Edelsohn', 'Michael.Felt'] pr_nums = [] priority = 'normal' resolution = 'duplicate' stage = 'resolved' status = 'closed' superseder = '40092' type = None url = 'https://bugs.python.org/issue43665' versions = ['Python 3.9', 'Python 3.10'] ```

    aixtools commented 3 years ago

    Since bpo-43517 test_importlib 'fails' (bot status) with ENV_CHANGED.

    The core dump is caused by SIGTRAP. I need help to learn how to stop the core dump from being cleaned up so I can load it into dbx and hopefully understand/learn with sub-test is actually having issues.

    e.g., see https://buildbot.python.org/all/#/builders/438/builds/1031/steps/5/logs/stdio for current bot exit status

    Thx for assistance.

    vstinner commented 3 years ago

    A core dump is a very bad sign of health.

    Can you please try to bisect which test is causing the segfault using bisect_cmd? Try the command:

    ./python -m test.bisect_cmd -o bisect --fail-env-changed test_importlib

    The command should takes 10-60 minutes, and you should see the number of tests decreasing. I hope that at the end, the command will identify a single method triggering the crash.

    At the end, the create "bisect" file contains the test methods causing the bug.

    Then you can re-run tests with:

    ./python -m test --fail-env-changed test_importlib --matchfile=bisect

    The core dump is caused by SIGTRAP. I need help to learn how to stop the core dump from being cleaned up so I can load it into dbx and hopefully understand/learn with sub-test is actually having issues.

    You can re-run the test without libregrtest by running it directly. Example:

    ./python -m test.test_importlib test_spec.Frozen_ModuleSpecMethodsTests.test_reload -v

    It should create a core dump in the current directory, or you run the command in your favorite debugger.

    My notes on debugging a Python crash: https://pythondev.readthedocs.io/debug_tools.html#core-dump

    aixtools commented 3 years ago

    On 30/03/2021 09:40, STINNER Victor wrote:

    STINNER Victor \vstinner@python.org\ added the comment:

    A core dump is a very bad sign of health.

    Can you please try to bisect which test is causing the segfault using bisect_cmd? Try the command:

    ./python -m test.bisect_cmd -o bisect --fail-env-changed test_importlib

    The command should takes 10-60 minutes, and you should see the number of tests decreasing. I hope that at the end, the command will identify a single method triggering the crash.

    At the end, the create "bisect" file contains the test methods causing the bug.

    Then you can re-run tests with:

    ./python -m test --fail-env-changed test_importlib --matchfile=bisect I like this!:

    Fortunately, it did not take 10  minutes:

    Writing 1 tests into bisect

    Tests (1):

    Output written into bisect Bisection completed in 34 iterations and 0:00:28

    > The core dump is caused by SIGTRAP. I need help to learn how to stop the core dump from being cleaned up so I can load it into dbx and hopefully understand/learn with sub-test is actually having issues. You can re-run the test without libregrtest by running it directly. Example:

    ./python -m test.test_importlib test_spec.Frozen_ModuleSpecMethodsTests.test_reload -v

    While - to use your example - using the normal verbose test syntax I do see the test you reference:

    but, when I run your example (and many variations) I always get something such as:

    So, still - closer - but not quite there with:

    +++++

    ./python -m test.test_importlib test_spec.Frozen_ModuleSpecMethodsTests.test_reload -v test_spec (unittest.loader._FailedTest) ... ERROR

    \====================================================================== ERROR: testspec (unittest.loader._FailedTest) ---------------------------------------------------------------------- AttributeError: module '\_main__' has no attribute 'test_spec'

    ---------------------------------------------------------------------- Ran 1 test in 0.000s

    FAILED (errors=1) +++++

    It should create a core dump in the current directory, or you run the command in your favorite debugger.

    My notes on debugging a Python crash: https://pythondev.readthedocs.io/debug_tools.html#core-dump

    ---------- nosy: +vstinner


    Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue43665\


    vstinner commented 3 years ago

    test.test_importlib.test_threaded_import.ThreadedImportTests.test_multiprocessing_pool_circular_import

    This test comes from bpo-41567 and it simply runs a script. You can run the script directly:

    ./python -X dev Lib/test/test_importlib/partial/pool_in_threads.py ; echo $?

    aixtools commented 3 years ago

    On 31/03/2021 18:46, STINNER Victor wrote:

    STINNER Victor \vstinner@python.org\ added the comment:

    test.test_importlib.test_threaded_import.ThreadedImportTests.test_multiprocessing_pool_circular_import

    This test comes from bpo-41567 and it simply runs a script. You can run the script directly:

    ./python -X dev Lib/test/test_importlib/partial/pool_in_threads.py ; echo $?

    Status is zero -0 but I get a core dump!

    aixtools@cpython2:[/home/aixtools/py3a-10.0]./python -X dev Lib/test/test_importlib/partial/pool_in_threads.py ; echo $? 0 aixtools@cpython2:[/home/aixtools/py3a-10.0]ls -ltr core -rw-r--r--    1 aixtools staff     129435720 Mar 31 16:58 core

    +++++++++++++++ Looking at the core dump - part 1. +++++++++++++++

    The address (0xd118ff80) hints at a shared library (pthread?)

    I'll try to find it - but maybe you can just remind me - the CLI arguments to make Python more verbose about what it is doing. Note: all addresses beginning with 0x1 (as in 0x101ca238) are program code addresses. 0x3 (0x30163ad8) imply a shared memory segment being created and used by the application.

    Perhaps more relevant aree all these lines with what appear to be extreme argcount values, e.g., from the bottom:

    _PyEval_Vector(tstate = 0x30187038, con = 0x20107b40, locals = 0x30111768, args = 0x301219b8, argcount = 806560272, kwnames = 0x301314d8), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = (nil), stack = (nil), nargsf = 570570720, kwnames = (nil)), line 361 in "call.c" _PyEval_EvalFrameDefault(tstate = (nil), f = 0x40000000, throwflag =

    -257396708), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x22023cf0, con = 0x2000120c, locals = 0x22023960, args = 0x220239f0, argcount = 3508213100, kwnames = 0x22023cf0), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x220239f0, stack = (nil), nargsf = 570572016, kwnames = 0x220239f0), line 361 in "call.c" method_vectorcall(method = 0x10103bdc, args = (nil), nargsf = 0, kwnames = 0x20045994), line 119 in "abstract.h" PyVectorcall_Call(callable = 0xdddddddd, tuple = 0xdddddddd, kwargs =

    0xdddddddd), line 255 in "call.c" _PyObject_Call(tstate = (nil), callable = (nil), args = (nil), kwargs = (nil)), line 298 in "call.c" thread_run(boot_raw = (nil)), line 1076 in "_threadmodule.c"

    pythread_wrapper(arg = (nil)), line 240 in "thread_pthread.h"

    +++++++++++++++++++++++++++++++++++++++++++ Looking at the core dump - the complete 'where' results: +++++++++++++++++++++++++++++++++++++++++++

    aixtools@cpython2:[/home/aixtools/py3a-10.0]dbx ./python core Type 'help' for help. [using memory image in core] reading symbolic information ...

    Trace/BPT trap in _internal_error at 0xd118ff80 ($t1) 0xd118ff80 (_internal_error+0x80) 80410014            lwz r2,0x14(r1) (dbx) where _internal_error(??, ??, ??) at 0xd118ff80 pth_usched._usched_dispatch_front._event_notify_locked@AF34_24(??, ??, ??, ??, ??) at 0xd119fc2c _event_notify(??, ??, ??) at 0xd119ee38 _cond_broadcast(??, ??, ??) at 0xd11af588 pthread_cond_signal@AF29_12(??, ??) at 0xd11b1568 pthread_cond_signal(??) at 0xd11afc88 PyThread_release_lock(lock = 0x10119354), line 692 in "thread_pthread.h" release_sentinel(wr_raw = 0x10118f14), line 1289 in "_threadmodule.c" PyThreadState_Clear(tstate = 0x101ca238), line 873 in "pystate.c" _PyThreadState_DeleteExcept(runtime = 0xd5c234d0, tstate = 0x22021d30), line 987 in "pystate.c" _PyEval_ReInitThreads(tstate = 0x30163ad8), line 506 in "ceval.c" PyOS_AfterFork_Child(), line 600 in "posixmodule.c" os_fork_impl(module = 0x3033a030), line 6659 in "posixmodule.c" cfunction_vectorcall_NOARGS(func = 0x30337330, args = 0x302ec7d0, nargsf = 570564304, kwnames = 0x30337328), line 485 in "methodobject.c" _PyEval_EvalFrameDefault(tstate = 0x302e058c, f = 0x302e847a, throwflag = 0), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x302f2350, con = 0x20107b40, locals = 0x302e8478, args = 0x302e79b8, argcount = 808016112, kwnames = 0x302cd898), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x00000008, stack = 0x3004d6e8, nargsf = 570564752, kwnames = 0x302f6030), line 361 in "call.c" _PyEval_EvalFrameDefault(tstate = 0x101c9554, f = 0x20107b40, throwflag = 570565024), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x00000018, con = 0x200084a0, locals = 0x220221f0, args = 0x88000284, argcount = 269576140, kwnames = 0x2007a390), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x2007a4e0, stack = 0x20107b40, nargsf = 570565168, kwnames = 0x20045994), line 361 in "call.c" _PyObject_FastCallDictTstate(tstate = 0x100a7e84, callable = 0x302ed780, args = 0x220222b0, nargsf = 537155988, kwargs = 0x100a8a54), line 142 in "call.c" _PyObject_Call_Prepend(tstate = 0x1001fa24, callable = 0x302ce5d0, obj = 0x22022320, args = 0x20045994, kwargs = 0x100feffc), line 431 in "call.c" slot_tp_init(self = 0x302e06f0, args = 0x30122e76, kwds = 0x30122e70), line 7097 in "typeobject.c" unnamed block in type_call(type = 0x10105284, args = 0x20107b40, kwds = 0x220223e0), line 1058 in "typeobject.c" type_call(type = 0x10105284, args = 0x20107b40, kwds = 0x220223e0), line 1058 in "typeobject.c" _PyObject_MakeTpCall(tstate = 0x101194f4, callable = 0x20107b40, args = 0x22022460, nargs = 537155988, keywords = 0x10167b60), line 215 in "call.c" _PyEval_EvalFrameDefault(tstate = 0x3030e870, f = 0x3013007e, throwflag = 806551608), line 1429 in "abstract.h" _PyEval_Vector(tstate = (nil), con = 0x303384a0, locals = 0x30130038, args = 0x3012e0d8, argcount = 806421984, kwnames = 0x301385c8), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x1004067c, stack = 0x20107b40, nargsf = 570566176, kwnames = 0x00000001), line 361 in "call.c" _PyEval_EvalFrameDefault(tstate = 0x100feffc, f = 0x3003ad12, throwflag = 805547256), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x30299248, con = 0x00000002, locals = 0x3003acf8, args = 0x3023b570, argcount = 807246344, kwnames = 0x3023e5c8), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x302d2ca0, stack = 0x3012dd18, nargsf = 570566624, kwnames = 0x300426a0), line 361 in "call.c" _PyEval_EvalFrameDefault(tstate = 0x10146304, f = 0x20107b40, throwflag = 0), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x30324334, con = 0x20107b40, locals = 0x30205cb8, args = 0x30215fd0, argcount = 807427328, kwnames = 0x3023e550), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x10144378, stack = 0x20107b40, nargsf = 570567072, kwnames = 0x20045994), line 361 in "call.c" _PyEval_EvalFrameDefault(tstate = 0x200405e8, f = 0x20107b40, throwflag = 537971048), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x00000052, con = 0x2010c9a8, locals = 0x2010c968, args = 0x301d9c90, argcount = 806702304, kwnames = 0x3023e208), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x00000008, stack = 0x3023c4a8, nargsf = 570567520, kwnames = 0x30237e30), line 361 in "call.c" _PyEval_EvalFrameDefault(tstate = 0x101c9554, f = (nil), throwflag =

    807029456), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x300f9e98, con = 0x30237d18, locals = (nil), args = 0x302968d0, argcount = 537134568, kwnames = 0x30124a78), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x2007a4e0, stack = 0x20107b40, nargsf = 570567936, kwnames = 0x20045994), line 361 in "call.c" unnamed block in _PyObject_FastCallDictTstate(tstate = 0x10167bb8, callable = 0x30233d50, args = 0x22022d80, nargsf = 1, kwargs = (nil)), line 153 in "call.c" _PyObject_FastCallDictTstate(tstate = 0x10167bb8, callable = 0x30233d50, args = 0x22022d80, nargsf = 1, kwargs = (nil)), line 153 in "call.c" _PyObject_Call_Prepend(tstate = 0x1001fa24, callable = 0x302968d0, obj = 0x22022df0, args = 0x20045994, kwargs = 0x100feffc), line 431 in "call.c" slot_tp_init(self = 0x3021a8d0, args = 0x20107b40, kwds = 0x22022e70), line 7097 in "typeobject.c" unnamed block in type_call(type = 0x100a8684, args = 0x30206db0, kwds = 0x22022eb0), line 1058 in "typeobject.c" type_call(type = 0x100a8684, args = 0x30206db0, kwds = 0x22022eb0), line 1058 in "typeobject.c" _PyObject_MakeTpCall(tstate = 0x10119354, callable = 0x200c8404, args = 0x22022f30, nargs = 537155988, keywords = 0x10167b60), line 215 in "call.c" _PyEval_EvalFrameDefault(tstate = 0x2007a4e0, f = 0x20107b40, throwflag = 570568752), line 1429 in "abstract.h" _PyEval_Vector(tstate = 0x100aa6fc, con = 0x200084a0, locals = 0x220230a0, args = 0x20045994, argcount = 269575348, kwnames = 0x20036690), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x302bc760, stack = 0x20107b40, nargsf = 570568944, kwnames = 0x10458e60), line 361 in "call.c" method_vectorcall(method = 0x30126e68, args = 0x30239030, nargsf = 570569072, kwnames = 0x20045994), line 119 in "abstract.h" _PyEval_EvalFrameDefault(tstate = (nil), f = 0x3012c898, throwflag = 0), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x30123bb0, con = 0x200084a0, locals = 0x220232d0, args = 0x00000008, argcount = 352, kwnames = 0x20036690), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x3012ddd8, stack = 0x30232f80, nargsf = 570569504, kwnames = 0x20045994), line 361 in "call.c" PyVectorcall_Call(callable = 0x100a7e84, tuple = 0x3012c898, kwargs =

    0x22023370), line 255 in "call.c" _PyObject_Call(tstate = 0x100aa6fc, callable = 0x20107b40, args = (nil), kwargs = 0x20045994), line 298 in "call.c" do_call_core(tstate = 0x302c03b0, trace_info = 0x200084a0, func = 0x22023460, callargs = (nil), kwdict = 0x10167b60), line 5974 in "ceval.c" _PyEval_EvalFrameDefault(tstate = 0x302bb478, f = 0x3005d44c, throwflag = 805688288), line 4324 in "ceval.c" _PyEval_Vector(tstate = 0x30187198, con = 0x3005d428, locals = 0x3005d3e0, args = 0x301219b8, argcount = 806048208, kwnames = 0x301316b8), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = (nil), stack = (nil), nargsf = 570570272, kwnames = (nil)), line 361 in "call.c" _PyEval_EvalFrameDefault(tstate = (nil), f = (nil), throwflag = 806556888), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x30187038, con = 0x20107b40, locals = 0x30111768, args = 0x301219b8, argcount = 806560272, kwnames = 0x301314d8), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = (nil), stack = (nil), nargsf = 570570720, kwnames = (nil)), line 361 in "call.c" _PyEval_EvalFrameDefault(tstate = (nil), f = 0x40000000, throwflag =

    -257396708), line 1431 in "abstract.h" _PyEval_Vector(tstate = 0x22023cf0, con = 0x2000120c, locals = 0x22023960, args = 0x220239f0, argcount = 3508213100, kwnames = 0x22023cf0), line 46 in "pycore_ceval.h" _PyFunction_Vectorcall(func = 0x220239f0, stack = (nil), nargsf = 570572016, kwnames = 0x220239f0), line 361 in "call.c" method_vectorcall(method = 0x10103bdc, args = (nil), nargsf = 0, kwnames = 0x20045994), line 119 in "abstract.h" PyVectorcall_Call(callable = 0xdddddddd, tuple = 0xdddddddd, kwargs =

    0xdddddddd), line 255 in "call.c" _PyObject_Call(tstate = (nil), callable = (nil), args = (nil), kwargs = (nil)), line 298 in "call.c" thread_run(boot_raw = (nil)), line 1076 in "_threadmodule.c" pythread_wrapper(arg = (nil)), line 240 in "thread_pthread.h" (dbx)

    ----------


    Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue43665\


    vstinner commented 3 years ago

    Interesting part of the traceback:

    pthread_cond_signal@AF29_12(??, ??) at 0xd11b1568 ... release_sentinel ... _PyThreadState_DeleteExcept ... PyOS_AfterFork_Child(), line 600 in "posixmodule.c"

    It sounds like a variant of bpo-40092: "Crash in _PyThreadState_DeleteExcept() at fork in the process child".

    See also the parent issue: https://bugs.python.org/issue40068#msg365031