python / cpython

The Python programming language
https://www.python.org
Other
63.35k stars 30.34k forks source link

Possible refleak in _asynciomodule.c future_add_done_callback() #84242

Closed aeros closed 4 years ago

aeros commented 4 years ago
BPO 40061
Nosy @pitrou, @asvetlov, @1st1, @ZackerySpytz, @miss-islington, @aeros
PRs
  • python/cpython#19748
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/aeros' closed_at = created_at = labels = ['3.8', '3.9', 'performance', 'extension-modules', 'expert-C-API', '3.7'] title = 'Possible refleak in _asynciomodule.c future_add_done_callback()' updated_at = user = 'https://github.com/aeros' ``` bugs.python.org fields: ```python activity = actor = 'aeros' assignee = 'aeros' closed = True closed_date = closer = 'aeros' components = ['Extension Modules', 'C API'] creation = creator = 'aeros' dependencies = [] files = [] hgrepos = [] issue_num = 40061 keywords = ['patch'] message_count = 6.0 messages = ['364985', '365029', '367507', '367511', '370365', '370366'] nosy_count = 6.0 nosy_names = ['pitrou', 'asvetlov', 'yselivanov', 'ZackerySpytz', 'miss-islington', 'aeros'] pr_nums = ['19748'] priority = 'high' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue40061' versions = ['Python 3.7', 'Python 3.8', 'Python 3.9'] ```

    aeros commented 4 years ago

    When using the test-with-buildbots label in python/cpython#63348 (which involved no C changes), a failure occurred in test_asyncio for several of the refleak buildbots. Here's the output of a few:

    AMD64 Fedora Stable Refleaks PR:

    test_asyncio leaked [3, 3, 27] references, sum=33 test_asyncio leaked [3, 3, 28] memory blocks, sum=34 2 tests failed again: test__xxsubinterpreters test_asyncio == Tests result: FAILURE then FAILURE ==

    AMD64 RHEL8 Refleaks PR:

    test_asyncio leaked [3, 3, 3] references, sum=9 test_asyncio leaked [3, 3, 3] memory blocks, sum=9 2 tests failed again: test__xxsubinterpreters test_asyncio == Tests result: FAILURE then FAILURE ==

    RHEL7 Refleaks PR:

    test_asyncio leaked [3, 3, 3] references, sum=9 test_asyncio leaked [3, 3, 3] memory blocks, sum=9 2 tests failed again: test__xxsubinterpreters test_asyncio == Tests result: FAILURE then FAILURE ==

    I'm unable to replicate it locally, but I think I may have located a subtle, uncommon refleak in future_add_done_callback(), within _asynciomodule.c. Specifically:

                PyObject *tup = PyTuple_New(2);
                if (tup == NULL) {
                    return NULL;
                }
                Py_INCREF(arg);
                PyTuple_SET_ITEM(tup, 0, arg);
                Py_INCREF(ctx);
                PyTuple_SET_ITEM(tup, 1, (PyObject *)ctx);
    
                if (fut->fut_callbacks != NULL) {
                    int err = PyList_Append(fut->fut_callbacks, tup);
                    if (err) {
                        Py_DECREF(tup);
                        return NULL;
                    }
                    Py_DECREF(tup);
                }
                else {
                    fut->fut_callbacks = PyList_New(1);
                    if (fut->fut_callbacks == NULL) {
                        // Missing ``Py_DECREF(tup);`` ?
                        return NULL;
                    }

    (The above code is located at: https://github.com/python/cpython/blob/7668a8bc93c2bd573716d1bea0f52ea520502b28/Modules/_asynciomodule.c#L664-L685)

    In the above conditional for "if (fut->fut_callbacks == NULL)", it appears that tup is pointing to a non-NULL new reference at this point, and thus should be decref'd prior to returning NULL. Otherwise, it seems like it could be leaked.

    But, I would appreciate it if someone could double check this (the C-API isn't an area I'm experienced); particularly since this code has been in place for a decent while (since 3.7). I suspect it's gone undetected and only failed intermittently because this specific return NULL path is rather uncommon.

    I'd be glad to open a PR to address the issue, assuming I'm not missing something with the above refleak. Otherwise, feel free to correct me.

    aeros commented 4 years ago

    When using the test-with-buildbots label in python/cpython#63348 (which involved no C changes), a failure occurred in test_asyncio for several of the refleak buildbots.

    To clarify, the refleak failures occurred when i applied the label to the following commit: https://github.com/python/cpython/pull/19149/commits/21a12e9340644c81e758ddf20fc9034f265d1930

    pitrou commented 4 years ago

    You're right that a Py_DECREF is missing there. However, it seems unlikely that this is triggering the test failure, because PyList_New(1) will practically never fail in normal conditions (and even making it deliberately fail would be quite difficult).

    aeros commented 4 years ago

    Antoine Pitrou wrote:

    You're right that a Py_DECREF is missing there. However, it seems unlikely that this is triggering the test failure, because PyList_New(1) will practically never fail in normal conditions (and even making it deliberately fail would be quite difficult).

    Thanks for clarifying; spotting refleaks is something that I've only recently started to get the hang of, so I wanted to double check first. :-)

    I should have updated this issue, but I resolved the main test_asyncio failure that I saw from python/cpython#63348 in python/cpython#63481. Either way though, I think this could still be fixed (even if it would rarely be an issue in most situations).

    miss-islington commented 4 years ago

    New changeset 7b78e7f9fd77bb3280ee39fb74b86772a7d46a70 by Zackery Spytz in branch 'master': bpo-40061: Fix a possible refleak in _asynciomodule.c (GH-19748) https://github.com/python/cpython/commit/7b78e7f9fd77bb3280ee39fb74b86772a7d46a70

    aeros commented 4 years ago

    Thanks for the PR, Zackery.