python / cpython

The Python programming language
https://www.python.org
Other
62.46k stars 29.98k forks source link

contextvars: hamt_alloc() must initialize h_root and h_count fields #77984

Closed vstinner closed 6 years ago

vstinner commented 6 years ago
BPO 33803
Nosy @vstinner, @ned-deily, @1st1
PRs
  • python/cpython#7504
  • python/cpython#7505
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', '3.7', '3.8', 'type-crash'] title = 'contextvars: hamt_alloc() must initialize h_root and h_count fields' updated_at = user = 'https://github.com/vstinner' ``` bugs.python.org fields: ```python activity = actor = 'yselivanov' assignee = 'none' closed = True closed_date = closer = 'yselivanov' components = ['Interpreter Core'] creation = creator = 'vstinner' dependencies = [] files = [] hgrepos = [] issue_num = 33803 keywords = ['patch'] message_count = 12.0 messages = ['318990', '318991', '318992', '318994', '318997', '319001', '319013', '319026', '319032', '319033', '354172', '354199'] nosy_count = 3.0 nosy_names = ['vstinner', 'ned.deily', 'yselivanov'] pr_nums = ['7504', '7505'] priority = None resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'crash' url = 'https://bugs.python.org/issue33803' versions = ['Python 3.7', 'Python 3.8'] ```

    vstinner commented 6 years ago

    test_asyncio started to crash in https://github.com/python/cpython/pull/7487

    I debugged the crash with Yury: _PyHAMT_New() triggers indirectly a GC collection at "o->h_root = hamt_node_bitmap_new(0);". Problem: the object that is being created is already tracked by the GC, whereas its h_root field is not set. Then the GC does crash because the field is a random pointer.

    hamt_alloc() must initialize h_root and h_count before tracking the object in the GC.

    Thinking about the GC when writing an object constructor is hard :-(

    Maybe we should add a debug option to trigger the GC more often to make such random bug more likely. contextvars has been implemented a few months ago, and we only spotted the bug a few days before Python 3.7.0 final release...

    vstinner commented 6 years ago

    One solution to trigger so crash more frequently is to reduce the threshold of the generation 0 of the garbage collector.

    Here is a patch to do that when using -X dev: change the default threshold from 700 to 5 for the generation 0.

    With this patch, "PYTHONDEVMODE=1 python -m test test_context" does also crash as expected.

    diff --git a/Python/pylifecycle.c b/Python/pylifecycle.c
    index 958219b744..81218fb964 100644
    --- a/Python/pylifecycle.c
    +++ b/Python/pylifecycle.c
    @@ -622,6 +622,10 @@ _Py_InitializeCore(const _PyCoreConfig *core_config)
             return _Py_INIT_ERR("runtime core already initialized");
         }
    
    +    if (core_config->dev_mode) {
    +        _PyRuntime.gc.generations[0].threshold = 5;
    +    }
    +
         /* Py_Finalize leaves _Py_Finalizing set in order to help daemon
          * threads behave a little more gracefully at interpreter shutdown.
          * We clobber it here so the new interpreter can start with a clean
    vstinner commented 6 years ago

    Don't forget to merge https://github.com/python/cpython/pull/7487 once this bug is fixed. I would like to see https://bugs.python.org/issue33792 in Python 3.7 *if possible* (the feature now seems "required" for the new asyncio.loop() function).

    1st1 commented 6 years ago

    Thanks Victor for debugging this. I made a PR (which is now trivial) and double checked all other calls to GCTrack in context.c & hamt.c.

    1st1 commented 6 years ago

    New changeset 378c53cc3187dba57c7560ccc2557f516c8a7bc8 by Yury Selivanov in branch 'master': bpo-33803: Fix a crash in hamt.c (bpo-7504) https://github.com/python/cpython/commit/378c53cc3187dba57c7560ccc2557f516c8a7bc8

    1st1 commented 6 years ago

    New changeset a971a6fdb111bb62911ccf45aa9fe734e2e7a590 by Yury Selivanov (Miss Islington (bot)) in branch '3.7': bpo-33803: Fix a crash in hamt.c (GH-7504) (GH-7505) https://github.com/python/cpython/commit/a971a6fdb111bb62911ccf45aa9fe734e2e7a590

    1st1 commented 6 years ago
    • if (core_config->dev_mode) {
    • _PyRuntime.gc.generations[0].threshold = 5;
    • }

    I'd love to have a flag to turn this on, or maybe we should enable it for -X dev.

    vstinner commented 6 years ago

    I'd love to have a flag to turn this on, or maybe we should enable it for -X dev.

    Well, there is already a public API to do it manually: gc.set_threshold(5).

    I'm not sure about a threshold of 5 for -X dev: that's very very low and so kill performances. -X dev can be slower, not but 10x slower for example. I didn't measure performance.

    Such bug is rare, so I'm not sure about putting gc.set_threshold(5) in -X dev.

    vstinner commented 6 years ago

    FYI the bug was also seen 8 hours ago on a different asyncio PR: https://github.com/python/cpython/pull/423#issuecomment-395681351

    vstinner commented 6 years ago

    FYI the bug was also seen 8 hours ago on a different asyncio PR:

    Oops, my message is misleading: the crash is not a regression. I just wanted to notice that a different PR also triggered the crash before the crash has been fixed. I'm just surprised that the bug decided to wake up 4 months after contextvars has been merged. Why yesterday and not previously?

    vstinner commented 4 years ago

    Notes for myself.

    In bpo-38392, I modified PyObject_GC_Track() in debug mode to detect this bug.

    For example, this bug can be reproduced with this change: ---

    diff --git a/Python/hamt.c b/Python/hamt.c
    index 28b4e1ef6c..d7dd555d15 100644
    --- a/Python/hamt.c
    +++ b/Python/hamt.c
    @@ -2478,8 +2478,6 @@ hamt_alloc(void)
         if (o == NULL) {
             return NULL;
         }
    -    o->h_count = 0;
    -    o->h_root = NULL;
         o->h_weakreflist = NULL;
         PyObject_GC_Track(o);
         return o;

    Python now detects the bug in debug mode: ---

    $ ./python -m test -v test_context
    (...)
    test_context_copy_1 (test.test_context.ContextTest) ...
    Modules/gcmodule.c:1931: visit_validate: Assertion failed: PyObject_GC_Track() object is not valid
    Enable tracemalloc to get the memory block allocation traceback

    object address : 0x7f2b17408d70 object refcount : 1 object type : 0x76bc20 object type name: hamt object repr : \<hamt object at 0x7f2b17408d70>

    Fatal Python error: _PyObject_AssertFailed Python runtime state: initialized

    Current thread 0x00007f2b25590740 (most recent call first): File "/home/vstinner/python/master/Lib/test/test_context.py", line 322 in test_context_copy_1 File "/home/vstinner/python/master/Lib/unittest/case.py", line 616 in _callTestMethod File "/home/vstinner/python/master/Lib/unittest/case.py", line 659 in run (...) File "/home/vstinner/python/master/Lib/runpy.py", line 192 in _run_module_as_main Aborted (core dumped) ---

    1st1 commented 4 years ago

    In bpo-38392, I modified PyObject_GC_Track() in debug mode to detect this bug.

    Awesome!!!