python / cpython

The Python programming language
https://www.python.org
Other
62.71k stars 30.06k forks source link

Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. #77789

Closed ericsnowcurrently closed 4 years ago

ericsnowcurrently commented 6 years ago
BPO 33608
Nosy @nascheme, @ncoghlan, @vstinner, @phsilva, @pmp-p, @markshannon, @ericsnowcurrently, @serhiy-storchaka, @soltysh, @1st1, @koobs, @zooba, @phmc, @emilyemorehouse, @jdahlin
PRs
  • python/cpython#9334
  • python/cpython#11617
  • python/cpython#11617
  • python/cpython#11617
  • python/cpython#12024
  • python/cpython#12062
  • python/cpython#12159
  • python/cpython#12240
  • python/cpython#12243
  • python/cpython#12245
  • python/cpython#12246
  • python/cpython#12247
  • python/cpython#12346
  • python/cpython#12360
  • python/cpython#12665
  • python/cpython#12806
  • python/cpython#12877
  • python/cpython#13714
  • python/cpython#13780
  • Files
  • managers.patch
  • take_gil.assert.patch
  • fini_crash.c
  • wrap_threadstate.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/ericsnowcurrently' closed_at = created_at = labels = ['expert-subinterpreters', '3.9'] title = 'Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.' updated_at = user = 'https://github.com/ericsnowcurrently' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'eric.snow' closed = True closed_date = closer = 'vstinner' components = ['Subinterpreters'] creation = creator = 'eric.snow' dependencies = [] files = ['48333', '48334', '48731', '48732'] hgrepos = [] issue_num = 33608 keywords = ['patch'] message_count = 82.0 messages = ['317333', '317334', '317344', '317404', '325481', '325548', '336490', '336544', '336596', '336602', '336603', '336677', '336687', '336897', '336948', '336951', '336952', '336975', '337075', '337096', '337110', '337115', '337116', '337117', '337159', '337179', '337218', '337235', '337554', '337557', '337996', '338039', '340057', '340069', '340070', '340071', '340075', '340077', '340079', '340082', '340133', '340139', '340143', '340145', '340666', '340768', '342791', '342792', '344210', '344211', '344240', '344244', '344245', '344355', '344372', '344417', '344429', '344434', '344435', '344438', '344441', '344442', '344443', '344444', '344445', '344446', '357169', '357170', '357179', '358364', '358367', '358748', '364670', '365160', '365163', '365178', '366009', '366013', '366016', '366017', '366019', '366027'] nosy_count = 16.0 nosy_names = ['nascheme', 'ncoghlan', 'vstinner', 'phsilva', 'pmpp', 'Mark.Shannon', 'eric.snow', 'serhiy.storchaka', 'maciej.szulik', 'yselivanov', 'koobs', 'steve.dower', 'pconnell', 'emilyemorehouse', 'Johan Dahlin', 'shprotx'] pr_nums = ['9334', '11617', '11617', '11617', '12024', '12062', '12159', '12240', '12243', '12245', '12246', '12247', '12346', '12360', '12665', '12806', '12877', '13714', '13780'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue33608' versions = ['Python 3.9'] ```

    ericsnowcurrently commented 6 years ago

    In order to keep subinterpreters properly isolated, objects from one interpreter should not be used in C-API calls in another interpreter. That is relatively straight-forward except in one case: indicating that the other interpreter doesn't need the object to exist any more (similar to PyBuffer_Release() but more general). I consider the following solutions to be the most viable. Both make use of recounts to protect cross-interpreter usage (with incref before sharing).

    1. manually switch interpreters (new private function) a. acquire the GIL b. if refcount > 1 then decref and release the GIL c. switch d. new thread (or re-use dedicated one) e. decref f. kill thread g. switch back h. release the GIL
    2. change pending call mechanism (see Py_AddPendingCall) to per-interpreter instead of global (add "interp" arg to signature of new private C-API function) a. queue a function that decrefs the object
    3. new cross-interpreter-specific private C-API function a. queue the object for decref (a la Py_AddPendingCall) in owning interpreter

    I favor #2, since it is more generally applicable. #3 would probably be built on #2 anyway. #1 is relatively inefficient. With #2, Py_AddPendingCall() would become a simple wrapper around the new private function.

    ericsnowcurrently commented 6 years ago

    As a lesser (IMHO) alternative, we could also modify Py_DECREF to respect a new "shared" marker of some sort (perhaps relative to bpo-33607), but that would probably still require one of the refcount-based solutions (and add a branch to Py_DECREF).

    vstinner commented 6 years ago

    "That is relatively straight-forward except in one case: indicating that the other interpreter doesn't need the object to exist any more (similar to PyBuffer_Release() but more general)"

    Why an interpreter would access an object from a different interpreter? Each interpreter should have its own object space, no?

    ncoghlan commented 6 years ago

    Adding a low level callback based mechanism to ask another interpreter to do work seems like a good way to go to me.

    As an example of why that might be needed, consider the case of sharing a buffer exporting object with another subinterpreter: when the memoryview in the subinterpreter is destroyed, it needs to request that the buffer view be released in the source interpreter that actually owns the original object.

    nascheme commented 6 years ago

    I would suggest that sharing of objects between interpreters should be stamped out. Could we have some #ifdef debug checking that would warn or assert so this doesn't happen? I know currently we share a lot of objects. However, in the long term, that does not seem like the correct design. Instead, each interpreter should have its own objects and any passing or sharing should be tightly controlled.

    ericsnowcurrently commented 6 years ago

    @Neil, we're definitely on the same page. In fact, in a world where subinterpreters do not share a GIL, we can't ever use an object in one interpreter that was created in another (due to thread safety on refcounts). The role of "tightly controlling" passing/sharing objects (for a very loose definition of "sharing") falls to the channels described in PEP-554. [1]

    However, there are several circumstances where interpreters may collaborate that involves one holding a reference (but not using it) to an object owned by the other. For instance, see PyBuffer_Release(). [2] This issue is about addressing that situation safely. It is definitely not about safely using objects from other interpreters.

    [1] The low-level implementation, including channels, already exists in Modules/_xxsubinterpretersmodule.c. [2] https://docs.python.org/3/c-api/buffer.html#c.PyBuffer_Release

    ericsnowcurrently commented 5 years ago

    New changeset ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 by Eric Snow in branch 'master': bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617) https://github.com/python/cpython/commit/ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465

    ericsnowcurrently commented 5 years ago

    python/cpython#55826 has introduces a slight performance regression which will need to be resolved. If I can't find the problem right away then I'll probably temporarily revert the commit.

    See https://mail.python.org/pipermail/python-dev/2019-February/156451.html.

    ericsnowcurrently commented 5 years ago

    At least, it *might* be a performance regression. Here are the two commits I tried:

    after: ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 (PR bpo-11617, from this issue) before: 463572c8beb59fd9d6850440af48a5c5f4c0c0c9 (the one right before)

    After building each (a normal build, not debug), I ran the micro-benchmark Raymond referred to ("./python Tools/scripts/var_access_benchmark.py") multiple times. There was enough variability between runs from the same commit that I'm uncertain at this point if there really is any performance regression. For the most part the results between the two commits are really close. Also, the results from the "after" commit fall within the variability I've seen between runs of the "before" commit.

    I'm going to try with the "performance" suite (https://github.com/python/performance) to see if it shows any regression.

    ericsnowcurrently commented 5 years ago

    Here's what I did (more or less):

    $ python3 -m pip install --user perf
    $ python3 -m perf system tune
    $ git clone git@github.com:python/performance.git
    $ git clone git@github.com:python/cpython.git
    $ cd cpython
    $ git checkout ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465
    $ ./configure
    $ make -j8
    $ ./python ../performance/pyperformance run --fast --output speed.after
    $ git checkout HEAD^
    $ make -j8
    $ ./python ../performance/pyperformance run --fast --output speed.before
    $ ./python ../performance/pyperformance compare speed.before speed.after -O table

    Here are my results:

    speed.before \============

    Performance version: 0.7.0 Report on Linux-4.15.0-45-generic-x86_64-with-glibc2.26 Number of logical CPUs: 4 Start date: 2019-02-25 20:39:44.151699 End date: 2019-02-25 20:48:04.817516

    speed.after \===========

    Performance version: 0.7.0 Report on Linux-4.15.0-45-generic-x86_64-with-glibc2.26 Number of logical CPUs: 4 Start date: 2019-02-25 20:29:56.610702 End date: 2019-02-25 20:38:11.667461

    +-------------------------+--------------+-------------+--------------+-----------------------+ | Benchmark | speed.before | speed.after | Change | Significance | +=========================+==============+=============+==============+=======================+ | 2to3 | 447 ms | 440 ms | 1.02x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | chaos | 155 ms | 156 ms | 1.01x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | crypto_pyaes | 154 ms | 152 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | deltablue | 10.7 ms | 10.5 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | django_template | 177 ms | 172 ms | 1.03x faster | Significant (t=3.66) | +-------------------------+--------------+-------------+--------------+-----------------------+ | dulwich_log | 105 ms | 106 ms | 1.01x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | fannkuch | 572 ms | 573 ms | 1.00x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | float | 149 ms | 146 ms | 1.02x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | go | 351 ms | 349 ms | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | hexiom | 14.6 ms | 14.4 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | html5lib | 126 ms | 122 ms | 1.03x faster | Significant (t=3.46) | +-------------------------+--------------+-------------+--------------+-----------------------+ | json_dumps | 17.6 ms | 17.2 ms | 1.02x faster | Significant (t=2.65) | +-------------------------+--------------+-------------+--------------+-----------------------+ | json_loads | 36.4 us | 36.1 us | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | logging_format | 15.2 us | 14.9 us | 1.02x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | logging_silent | 292 ns | 296 ns | 1.01x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | logging_simple | 13.7 us | 13.4 us | 1.02x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | mako | 22.9 ms | 22.5 ms | 1.02x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | meteor_contest | 134 ms | 134 ms | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | nbody | 157 ms | 161 ms | 1.03x slower | Significant (t=-3.85) | +-------------------------+--------------+-------------+--------------+-----------------------+ | nqueens | 134 ms | 132 ms | 1.02x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | pathlib | 30.1 ms | 31.0 ms | 1.03x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | pickle | 11.5 us | 11.6 us | 1.01x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | pickle_dict | 29.5 us | 30.5 us | 1.03x slower | Significant (t=-6.37) | +-------------------------+--------------+-------------+--------------+-----------------------+ | pickle_list | 4.54 us | 4.58 us | 1.01x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | pickle_pure_python | 652 us | 651 us | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | pidigits | 212 ms | 215 ms | 1.02x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | python_startup | 11.6 ms | 11.5 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | python_startup_no_site | 8.07 ms | 7.95 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | raytrace | 729 ms | 722 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | regex_compile | 249 ms | 247 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | regex_dna | 203 ms | 204 ms | 1.00x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | regex_effbot | 3.55 ms | 3.55 ms | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | regex_v8 | 28.3 ms | 28.3 ms | 1.00x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | richards | 105 ms | 105 ms | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | scimark_fft | 429 ms | 436 ms | 1.01x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | scimark_lu | 238 ms | 237 ms | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | scimark_monte_carlo | 144 ms | 139 ms | 1.04x faster | Significant (t=3.61) | +-------------------------+--------------+-------------+--------------+-----------------------+ | scimark_sor | 265 ms | 260 ms | 1.02x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | scimark_sparse_mat_mult | 5.41 ms | 5.25 ms | 1.03x faster | Significant (t=4.26) | +-------------------------+--------------+-------------+--------------+-----------------------+ | spectral_norm | 174 ms | 173 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | sqlalchemy_declarative | 216 ms | 214 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | sqlalchemy_imperative | 41.6 ms | 41.2 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | sqlite_synth | 3.99 us | 3.91 us | 1.02x faster | Significant (t=2.49) | +-------------------------+--------------+-------------+--------------+-----------------------+ | sympy_expand | 559 ms | 559 ms | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | sympy_integrate | 25.2 ms | 25.3 ms | 1.00x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | sympy_str | 261 ms | 260 ms | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | sympy_sum | 136 ms | 138 ms | 1.01x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | telco | 8.36 ms | 8.32 ms | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | tornado_http | 273 ms | 271 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | unpack_sequence | 58.8 ns | 60.0 ns | 1.02x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | unpickle | 21.5 us | 21.5 us | 1.00x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | unpickle_list | 5.60 us | 5.55 us | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | unpickle_pure_python | 497 us | 481 us | 1.03x faster | Significant (t=5.04) | +-------------------------+--------------+-------------+--------------+-----------------------+ | xml_etree_generate | 141 ms | 140 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | xml_etree_iterparse | 131 ms | 133 ms | 1.01x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | xml_etree_parse | 186 ms | 187 ms | 1.00x slower | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+ | xml_etree_process | 115 ms | 113 ms | 1.01x faster | Not significant | +-------------------------+--------------+-------------+--------------+-----------------------+

    ericsnowcurrently commented 5 years ago

    ...so it doesn't appear that my PR introduces a performance regression.

    vstinner commented 5 years ago

    ...so it doesn't appear that my PR introduces a performance regression.

    IMHO there is no performance regression at all. Just noice in the result which doesn't come with std dev.

    ericsnowcurrently commented 5 years ago

    FYI, I have a couple of small follow-up changes to land before I close this issue.

    db3l commented 5 years ago

    I suspect changes for this issue may be creating test_io failures on my windows builders, most notably my x86 Windows 7 builder where test_io has been failing both attempts on almost all builds. It fails with a lock failure during interpreter shutdown, and commit ef4ac967 appears the most plausible commit out of the set introduced at the first failing build on Feb 24.

    See https://buildbot.python.org/all/#/builders/58/builds/1977 for the first failure. test_io has failed both attempts on all but 3 of the subsequent 16 tests of the 3.x branch.

    It might be worth noting that this builder is slow, so if there are timeouts involved or a race condition of any sort it might be triggering it more readily than other builders. I do, however, see the same failures - albeit less frequently and not usually on both tries - on the Win8 and Win10 builders.

    For what it's worth one other oddity is that while having test_multiprocessing* failures are relatively common on the Win7 builder during the first round of tests due to exceeding the timeout, they usually pass on the retry, but over this same time frame have begun failing - not due to timeout - even on the second attempt, which is unusual. It might be coincidental but similar failures are also showing up sporadically on the Win8/Win10 builders where such failures are not common at all.

    ericsnowcurrently commented 5 years ago

    New changeset b05b711a2cef6c6c381e01069dedac372e0b9fb2 by Eric Snow in branch 'master': bpo-33608: Use _Py_AddPendingCall() in _PyCrossInterpreterData_Release(). (gh-12024) https://github.com/python/cpython/commit/b05b711a2cef6c6c381e01069dedac372e0b9fb2

    ericsnowcurrently commented 5 years ago

    New changeset bda918bf65a88560ec453aaba0758a9c0d49b449 by Eric Snow in branch 'master': bpo-33608: Simplify ceval's DISPATCH by hoisting eval_breaker ahead of time. (gh-12062) https://github.com/python/cpython/commit/bda918bf65a88560ec453aaba0758a9c0d49b449

    ericsnowcurrently commented 5 years ago

    I've merged the PR for hoisting eval_breaker before the ceval loop starts. @Raymond, see if that addresses the performance regression you've been seeing.

    I'll close this issue after I've sorted out the buildbot issues David mentioned (on his Windows builders).

    db3l commented 5 years ago

    If I can help with testing or builder access or anything just let me know. It appears that I can pretty reliably trigger the error through just the individual tests (test_daemon_threads_shutdown_std{out,err}_deadlock) in isolation, which is definitely less tedious than a full buildbot run to test a change.

    BTW, while not directly related since it was only just merged, and I won't pretend to have any real understanding of the changes here, I do have a question about PR 12024 ... it appears HEAD_UNLOCK is used twice in _PyInterpreterState_LookUpID. Shouldn't one of those be HEAD_LOCK?

    vstinner commented 5 years ago

    I suspect changes for this issue may be creating test_io failures on my windows builders, (...)

    I created bpo-36177 to track this regression.

    vstinner commented 5 years ago

    The commit ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 introduced a regression in test.test_multiprocessing_spawn.WithThreadsTestManagerRestart.test_rapid_restart: bpo-36114.

    Would it be possible to revert it until the bug is properly understood and fixed?

    vstinner commented 5 years ago

    The commit ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 introduced a regression in test.test_multiprocessing_spawn.WithThreadsTestManagerRestart.test_rapid_restart: bpo-36114.

    There is a very similar bug on Windows: bpo-36116.

    I'm now also quite confident that bpo-36177 is also a regression caused by this issue.

    vstinner commented 5 years ago

    New changeset 4d61e6e3b802399be62a521d6fa785698cb670b5 by Victor Stinner in branch 'master': Revert: bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617) (GH-12159) https://github.com/python/cpython/commit/4d61e6e3b802399be62a521d6fa785698cb670b5

    vstinner commented 5 years ago

    Hi Eric, I'm sorry but I had to revert your recent work. It introduced regressions in at least in test_io and test_multiprocessing_spawn on Windows and FreeBSD. I simply don't have the bandwidth to investigate this regression yet, whereas we really need the CI to remain stable to catch other regressions (the master branch received multiple new features recently, and there are some other regressions by that way ;-)).

    test_multiprocessing_spawn is *very* hard to reproduce on FreeBSD, but I can reliably reproduce it. It just takes time. The issue is a crash producing a coredump. I consider that the bug is important enough to justify a revert.

    The revert is an opportunity to seat down and track the root cause rather than urging to push a "temporary" quickfix. This bug seems to be very deep in the Python internals: thread state during Python shutdown. So it will take time to fully understand it and fix it (or redesign your recent changes, I don't know).

    Tell me if you need my help to reproduce the bug. The bug has been seen on FreeBSD but also Windows:

    -- The Night's Watch

    vstinner commented 5 years ago

    For curious people, Pablo Galindo spent a few hours to investigate https://bugs.python.org/issue36114 and I spent a few more hours to finish the analysis. The bug indirectly crashed my laptop :-) https://twitter.com/VictorStinner/status/1102528982036201478

    That's what I mean by "I don't have the bandwidth": we already spent hours to identify the regression ;-)

    ericsnowcurrently commented 5 years ago

    That's okay, Victor. Thanks for jumping on this. I'll take a look when I get a chance.

    vstinner commented 5 years ago

    That's okay, Victor. Thanks for jumping on this. I'll take a look when I get a chance.

    From what I saw, your first commit was enough to reproduce the crash.

    If I recall correctly, Antoine Pitrou modified the GIL so threads exit immediately when Py_Finalize() is called. I'm thinking at:

    void
    PyEval_RestoreThread(PyThreadState *tstate)
    {
        ...
        take_gil(tstate);
        /* _Py_Finalizing is protected by the GIL */
        if (_Py_IsFinalizing() && !_Py_CURRENTLY_FINALIZING(tstate)) {
            drop_gil(tstate);
            PyThread_exit_thread();
            Py_UNREACHABLE();
        }
        ...
        PyThreadState_Swap(tstate);
    }

    Problem: this code uses tstate, whereas the crash occurred because tstate pointed to freed memory:

    """ Thread 1 got the crash

    (gdb) p *tstate $3 = { prev = 0xdbdbdbdbdbdbdbdb, next = 0xdbdbdbdbdbdbdbdb, interp = 0xdbdbdbdbdbdbdbdb, ... }

    ...

    Thread 1 (LWP 100696):

    0 0x0000000000368210 in take_gil (tstate=0x8027e2050) at Python/ceval_gil.h:216

    1 0x0000000000368a94 in PyEval_RestoreThread (tstate=0x8027e2050) at Python/ceval.c:281

    ... """

    https://bugs.python.org/issue36114#msg337090

    When this crash occurred, Py_Finalize() already completed in the main thread!

    """
    void _Py_NO_RETURN
    Py_Exit(int sts)
    {
        if (Py_FinalizeEx() < 0) {  /* <==== DONE! */
            sts = 120;
        }
    exit(sts);    /* \<=============== Crash occurred here! \*/

    } """

    Py_Finalize() is supposed to wait for threads before deleting Python thread states:

    """
    int
    Py_FinalizeEx(void)
    {
        ...
    /* The interpreter is still entirely intact at this point, and the
     * exit funcs may be relying on that.  In particular, if some thread
     * or exit func is still waiting to do an import, the import machinery
     * expects Py_IsInitialized() to return true.  So don't say the
     * interpreter is uninitialized until after the exit funcs have run.
     * Note that Threading.py uses an exit func to do a join on all the
     * threads created thru it, so this also protects pending imports in
     * the threads created via Threading.
     */
    call_py_exitfuncs(interp);
    
    ...
    
    /* Remaining threads (e.g. daemon threads) will automatically exit
       after taking the GIL (in PyEval_RestoreThread()). */
    _PyRuntime.finalizing = tstate;
    _PyRuntime.initialized = 0;
    _PyRuntime.core_initialized = 0;
    ...
        /* Delete current thread. After this, many C API calls become crashy. */
        PyThreadState_Swap(NULL);
    
        PyInterpreterState_Delete(interp);
    ...

    } """

    The real problem for years are *deamon threads* which... BY DESIGN... remain alive after Py_Finalize() exit! But as I explained, they must exit as soon at they attempt to get GIL.

    ericsnowcurrently commented 5 years ago

    Thanks for taking a look Victor! That info is helpful. It will likely be a few days before I can sort this out.

    Once I have addressed the problem I'll re-merge. I plan on using the "buildbot-custom" branch to make sure the buildbots are happy with the change before making that PR.

    vstinner commented 5 years ago

    The bug is hard to reproduce even manually. I can test a PR for you once it's ready.

    ericsnowcurrently commented 5 years ago

    New changeset 5be45a6105d656c551adeee7770afdc3b806fbb5 by Eric Snow in branch 'master': bpo-33608: Minor cleanup related to pending calls. (gh-12247) https://github.com/python/cpython/commit/5be45a6105d656c551adeee7770afdc3b806fbb5

    ericsnowcurrently commented 5 years ago

    New changeset 8479a3426eb7d1840473f7788e639954363ed37e by Eric Snow in branch 'master': bpo-33608: Make sure locks in the runtime are properly re-created. (gh-12245) https://github.com/python/cpython/commit/8479a3426eb7d1840473f7788e639954363ed37e

    vstinner commented 5 years ago

    New changeset e3f4070aee6f2d489416fdcafd51d6b04d661919 by Victor Stinner in branch 'master': bpo-33608: Fix PyEval_InitThreads() warning (GH-12346) https://github.com/python/cpython/commit/e3f4070aee6f2d489416fdcafd51d6b04d661919

    ericsnowcurrently commented 5 years ago

    New changeset 842a2f07f2f08a935ef470bfdaeef40f87490cfc by Eric Snow in branch 'master': bpo-33608: Deal with pending calls relative to runtime shutdown. (gh-12246) https://github.com/python/cpython/commit/842a2f07f2f08a935ef470bfdaeef40f87490cfc

    ericsnowcurrently commented 5 years ago

    New changeset f13c5c8b9401a9dc19e95d8b420ee100ac022208 by Eric Snow in branch 'master': bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (gh-12360) https://github.com/python/cpython/commit/f13c5c8b9401a9dc19e95d8b420ee100ac022208

    vstinner commented 5 years ago

    I have a bad news for you Eric: I'm able again to reproduce the crash at commit f13c5c8b9401a9dc19e95d8b420ee100ac022208.

    vstinner@freebsd$ ./python -m test --matchfile=bisect5 test_multiprocessing_spawn --fail-env-changed -F Run tests sequentially 0:00:00 load avg: 0.69 [ 1] test_multiprocessing_spawn 0:00:06 load avg: 0.80 [ 2] test_multiprocessing_spawn 0:00:12 load avg: 1.19 [ 3] test_multiprocessing_spawn ... 0:01:55 load avg: 1.48 [ 21] test_multiprocessing_spawn 0:02:01 load avg: 1.53 [ 22] test_multiprocessing_spawn 0:02:08 load avg: 1.29 [ 23] test_multiprocessing_spawn 0:02:17 load avg: 1.51 [ 24] test_multiprocessing_spawn 0:02:27 load avg: 2.27 [ 25] test_multiprocessing_spawn 0:02:38 load avg: 3.14 [ 26] test_multiprocessing_spawn 0:02:48 load avg: 3.51 [ 27] test_multiprocessing_spawn Warning -- files was modified by test_multiprocessing_spawn Before: [] After: ['python.core'] test_multiprocessing_spawn failed (env changed)

    == Tests result: ENV CHANGED ==

    All 26 tests OK.

    1 test altered the execution environment: test_multiprocessing_spawn

    Total duration: 2 min 59 sec Tests result: ENV CHANGED

    Note: Sorry for not testing before, after one long week, I didn't succeed to catch up with my very long list of unread emails.

    I don't know what should be done. Revert? I don't have the bandwidth to investigate this crash.

    vstinner commented 5 years ago

    I ran "./python -m test --matchfile=bisect5 test_multiprocessing_spawn --fail-env-changed -F" 4 times in parallel: in less than 5 minutes (in fact, I didn't look carefully at the terminal, maybe it was faster), I got 3 core dumps :-(

    ericsnowcurrently commented 5 years ago

    Thanks for checking, Victor. Don't feel bad about your results, nor about not checking sooner. :) We'll get this sorted out.

    For now I'll revert. This is not code that changes very often, so there isn't much benefit to keeping it merged. Testing against a separate branch is just as easy.

    Could you point me at an immage for that VM or instructions on how to reproduce it? I hate having to bother you to test my changes! :)

    vstinner commented 5 years ago

    Eric Snow:

    For now I'll revert. This is not code that changes very often, so there isn't much benefit to keeping it merged. Testing against a separate branch is just as easy.

    Again, Python shutdown is *really* fragile. Last time I tried to "enhance" it, I also introduced random regressions and so I had to revert my changes.

    Old info about the crash, should still be relevant: https://bugs.python.org/issue36114#msg337090

    Could you point me at an immage for that VM or instructions on how to reproduce it? I hate having to bother you to test my changes! :)

    *In theory*, you should be able to reproduce the crash on any platform. But in my experience, bugs which involve multiple threads are simply "more likely" on FreeBSD because FreeBSD manages threads very differently than Linux. Sometimes, a bug can only be reproduce on one specific FreeBSD computer, but once the root issue has been identified, we start to be able to trigger the crash reliably on other platforms (like Linux).

    My procedure to reproduce the crash on FreeBSD: https://bugs.python.org/issue36114#msg337092

    I'm using FreeBSD 12.0 RELEASE VM hosted on Linux. My FreeBSD is not customized in any way.

    On modern Linux distributions, coredumps are no longer written in the current directory but handled by a system service like ABRT on Fedora. For this reason, Python test runner can "miss" crashes, especially in child processes run by tests (not directly in the process used to run the test).

    To get a coredump in the current directory on Linux, you can use:

    sudo bash -c 'echo "%e.%p" > /proc/sys/kernel/core_pattern'

    Manual test:

    $ ./python -c 'import ctypes; ctypes.string_at(0)'
    Segmentation fault (core dumped)

    vstinner@apu$ git status ... Untracked files: python.18343 ...

    Say hello to python.18343 coredump!

    Usually, running the command which trigger the crash multiple times in parallel (in different terminals, using screen and multiple terminals, etc.) makes the crash more likely since it does stress the system.

    Sometimes, I run the Python test suite in parallel to stress the system even more.

    The goal of the game is to trigger a race condition which depends on time. Stressing the system helps to "randomize" timings.

    ericsnowcurrently commented 5 years ago

    New changeset b75b1a3504a0cea6fac6ecba44c10b2629577025 by Eric Snow in branch 'master': bpo-33608: Revert "Factor out a private, per-interpreter _Py_AddPendingCall()." (gh-12806) https://github.com/python/cpython/commit/b75b1a3504a0cea6fac6ecba44c10b2629577025

    vstinner commented 5 years ago

    I tried but failed to reproduce the crash on Linux!?

    $ sudo bash -c 'echo "%e.%p" > /proc/sys/kernel/core_pattern'
    $ ./python -m test --matchfile=bisect5 test_multiprocessing_spawn --fail-env-changed -F
    # wait 5 min
    ^C
    $ ./python -m test --matchfile=bisect5 -j0 test_multiprocessing_spawn --fail-env-changed -F  # I added -j0
    # wait 5 min
    ^C
    $ ./python -m test --matchfile=bisect5 -j0 test_multiprocessing_spawn --fail-env-changed -F  # I added -j0
    # wait 5 min
    ^C

    No coredump seen...

    vstinner commented 5 years ago

    FYI AMD64 FreeBSD CURRENT Shared 3.x failed at commit f13c5c8b9401a9dc19e95d8b420ee100ac022208: https://buildbot.python.org/all/#/builders/168/builds/913

    But this issue has already been fixed: Eric reverted his change.

    ericsnowcurrently commented 5 years ago

    @Victor, I set up a FreeBSD 12.0 VM (in Hyper-v) and made sure core files were getting generated for segfaults. Then I cloned the cpython repo, built it (using GCC), and ran regrtest as you recommended. It generated no core files after half an hour. I adjusted the VM down to 1 CPU from 4 and there were no segfaults over an hour and a half of running those 4 test loops. So I've set the VM to 10% of a CPU and still have gotten no core files after over half an hour. The load average has been hovering between 5 and 6. I guess I'm not starving the VM enough. :)

    Any ideas of how far I need to throttle the VM? Is there more than just CPU that I need to limit?

    vstinner commented 5 years ago

    Any ideas of how far I need to throttle the VM? Is there more than just CPU that I need to limit?

    I don't know how to make the race condition more likely. I'm not sure that starving the CPU helps. Maybe try the opposite: add more CPUs and reduce the number of tests run in parallel.

    Did you test commit f13c5c8b9401a9dc19e95d8b420ee100ac022208?

    db3l commented 5 years ago

    Eric, I'm also seeing the same Win7 and Win10 worker failures with commit b75b1a350 as last time (test_multiprocessing_spawn on both, and test_io on Win7).

    For test_multiprocessing_spawn, it fails differently than Victor since no core file is generated, but I assume it's related in terms of child process termination. See for example https://buildbot.python.org/all/#/builders/3/builds/2390 for Win10, where test_mymanager_context fails with:

    \====================================================================== FAIL: test_mymanager_context (test.test_multiprocessing_spawn.WithManagerTestMyManager) ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\buildarea\3.x.bolen-windows10\build\lib\test\_test_multiprocessing.py", line 2747, in test_mymanager_context
        self.assertIn(manager._process.exitcode, (0, -signal.SIGTERM))
    AssertionError: 3221225477 not found in (0, -15)

    (3221225477 is C0000005 which I believe is an access violation)

    For some reason, the Windows 7 worker didn't get a test run while your commit was live, but I can reproduce the same error manually.

    For test_io, as before, its a shutdown lock failure:

    \====================================================================== FAIL: test_daemon_threads_shutdown_stdout_deadlock (test.test_io.CMiscIOTest) ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\cygwin\home\db3l\python.test\lib\test\test_io.py", line 4157, in test_daemon_threads_shutdown_stdout_deadlock
        self.check_daemon_threads_shutdown_deadlock('stdout')
      File "D:\cygwin\home\db3l\python.test\lib\test\test_io.py", line 4148, in check_daemon_threads_shutdown_deadlock
        self.assertIn("Fatal Python error: could not acquire lock "
    AssertionError: "Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stdout>'> at interpreter shutdown, possibly due to daemon threads" not found in ''

    In manual attempts I have yet to be able to recreate the test_multiprocessing_spawn failure under Win10 but can usually manage a 25-50% failure rate under Win7 (which is much slower). The test_io failure on Win7 however, appears to be more easily reproducible.

    It's possible I/O is more critical than CPU, or perhaps its impact on latency; I seem to more easily exacerbate the test_multiprocessing_spawn failure rate by loading down the host disk than its CPU. I also noticed that the Win10 failure was when test_io and test_multiprocessing_spawn overlapped.

    While I'm guessing this should happen on any low powered Windows VM, if it would help, I could arrange remote access to the Win7 worker for you. Or just test a change on your behalf. In fairness, it's unlikely to be useful for any significant remote debugging but perhaps at least having somewhere you could test a change, even if just with print-based debugging, might help. And while it might be an independent issue, the test_io failure rate appears to occur more reliably than test_multiprocessing_spawn.

    db3l commented 5 years ago

    I just noticed that my last message referenced the wrong commit. My test failures were against commit f13c5c8b9401a9dc19e95d8b420ee100ac022208 (the same as Victor).

    zooba commented 5 years ago

    New changeset 264490797ad936868c54b3d4ceb0343e7ba4be76 by Steve Dower in branch 'master': bpo-33608: Normalize atomic macros so that they all expect an atomic struct (GH-12877) https://github.com/python/cpython/commit/264490797ad936868c54b3d4ceb0343e7ba4be76

    koobs commented 5 years ago

    @Eric Happy to give you SSH access to one of the FreeBSD buildbots that's experienced the crash if it helps. Just flick me a pubkey, I'm on IRC (koobs @ freenode / #python-dev)

    e4e4d622-d34f-4b6f-9156-fabc0ed7e07f commented 5 years ago

    I was able to reproduce the error with version f13c5c8b9401a9dc19e95d8b420ee100ac022208 on FreeBSD 12.0 VM. The error seems to be caused not by those changes, but by lack of synchronization in the multiprocessing.managers.Server. The failure happens when running the "test_shared_memory_SharedMemoryManager_basics" with high CPU load and frequent interrupts e.g. moving some window during test. Mostly I used the "python -m test --fail-env-changed test_multiprocessing_spawn -m 'WithProcessesTestS[hu]*' -F" command to reproduce the crash. By analyzing core dumps I deduced that the crash happens during this call from the parent test process:

    class BaseManager(object):
        def _finalize_manager(process, address, authkey, state, _Client):
            ...
                try:
                    conn = _Client(address, authkey=authkey)
                    try:
                        dispatch(conn, None, 'shutdown')
                    finally:
                        conn.close()
                except Exception:
                    pass

    Main thread in the multiprocessing child:

    class Server(object):
        def serve_forever(self):
            ...
            try:
                accepter = threading.Thread(target=self.accepter)
                accepter.daemon = True
                accepter.start()
                try:
                    while not self.stop_event.is_set():
                        self.stop_event.wait(1)
                except (KeyboardInterrupt, SystemExit):
                    pass
            finally:
                ...
                sys.exit(0)  << main thread have finished and destroyed the interpreter

    Worker thread in the multiprocessing child. Locals: File "/usr/home/user/cpython/Lib/multiprocessing/managers.py", line 214, in handle_request c.send(msg) self = \<SharedMemoryServer(....)> funcname = 'shutdown' result = None request = (None, 'shutdown', (), {}) ignore = None args = () kwds = {} msg = ('#RETURN', None)

    Listing: class Server(object): def handle_request(self, c): ... try: result = func(c, *args, **kwds) \<\< calls Server.shutdown method except Exception: msg = ('#TRACEBACK', format_exc()) else: msg = ('#RETURN', result) try: c.send(msg) \<\< crashes with SIGBUS in _send_bytes -> write -> take_gil -> SET_GIL_DROP_REQUEST(tstate->interp) except Exception as e: try: c.send(('#TRACEBACK', format_exc())) except Exception: pass ... def shutdown(self, c): ... try: util.debug('manager received shutdown message') c.send(('#RETURN', None)) except: import traceback traceback.print_exc() finally: self.stop_event.set()

    Worker thread is daemonic and is not terminated during the interpreter finalization, thus it might still be running and is terminated silently when the process exits. The connection (c) has different implementations on several platforms, so we cannot be sure whether the connection is closed during shutdown or not, whether the last "c.send(msg)" blocks until the end of the process, returns instantly, or fails inconsistently. The error was there for a long time, but for two reasons it didn't cause much trouble:

    I haven't managed to find a nice clean test to reproduce the bug automatically. I suggest the changes for the multiprocessing/managers.py in the attachment.

    e4e4d622-d34f-4b6f-9156-fabc0ed7e07f commented 5 years ago

    Also it might be viable to add some assertion to verify the take_gil is not called with uninitialized interpreter. I used the changes in the attachment (take_gil.assert.patch), but it produced errors during test_tracemalloc with f13c5c8b9401a9dc19e95d8b420ee100ac022208 . It happens because, during startup with invalid arguments, the interpreter is finalized with pymain_main->pymain_free->_PyRuntime_Finalize before the error is printed. However, the problem seems to be fixed for me in the last revisions of master branch, so I upload the diff against it.

    ericsnowcurrently commented 5 years ago

    That's really helpful, Pavel! Thanks for investigating so thoroughly. I'm going to double check all the places I've made the assumption that "tstate" isn't NULL and likewise for "tstate->interp".

    Is there an issue open for the bug in multiprocessing? If not, would you mind opening one?

    ericsnowcurrently commented 5 years ago

    I've made a few tweaks and Victor did some cleanup, so I'm going to try the PR again. At first I'm also going to disable the _PyEval_FinishPendingCalls() call in _Py_FinalizeEx() and then enable it is a separate PR.

    Also, I've opened bpo-37127 specifically to track the problem with _PyEval_FinishPendingCalls() during _Py_FinalizeEx(), to ensure that the disabled call doesn't slip through the cracks. :)