python / cpython

The Python programming language
https://www.python.org
Other
63.47k stars 30.39k forks source link

gc_decref: Assertion "gc_get_refs(g) > 0" failed #99578

Closed LeamHall closed 1 year ago

LeamHall commented 1 year ago

Bug report

Error whlie testing the main branch. May be similar to:

The problem

When running ./python -m test:

0:18:59 load avg: 0.72 [187/433] test_imp
Modules/gcmodule.c:113: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
Enable tracemalloc to get the memory block allocation traceback

Your environment

Message log

0:18:59 load avg: 0.72 [187/433] test_imp
Modules/gcmodule.c:113: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
Enable tracemalloc to get the memory block allocation traceback

object address  : 0x7f3dcdf877d0
object refcount : 96
object type     : 0x5573e22fc880
object type name: module
object repr     : <module 'builtins' (built-in)>

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: initialized

Current thread 0x00007f3dce3c32c0 (most recent call first):
  Garbage-collecting
  File "/usr/local/src/cpython/Lib/test/support/__init__.py", line 738 in gc_collect
  File "/usr/local/src/cpython/Lib/test/libregrtest/save_env.py", line 314 in __exit__
  File "/usr/local/src/cpython/Lib/test/libregrtest/runtest.py", line 312 in _runtest_inner2
  File "/usr/local/src/cpython/Lib/test/libregrtest/runtest.py", line 360 in _runtest_inner
  File "/usr/local/src/cpython/Lib/test/libregrtest/runtest.py", line 235 in _runtest
  File "/usr/local/src/cpython/Lib/test/libregrtest/runtest.py", line 265 in runtest
  File "/usr/local/src/cpython/Lib/test/libregrtest/main.py", line 455 in run_tests_sequential
  File "/usr/local/src/cpython/Lib/test/libregrtest/main.py", line 572 in run_tests
  File "/usr/local/src/cpython/Lib/test/libregrtest/main.py", line 750 in _main
  File "/usr/local/src/cpython/Lib/test/libregrtest/main.py", line 709 in main
  File "/usr/local/src/cpython/Lib/test/libregrtest/main.py", line 773 in main
  File "/usr/local/src/cpython/Lib/test/__main__.py", line 2 in <module>
  File "/usr/local/src/cpython/Lib/runpy.py", line 88 in _run_code
  File "/usr/local/src/cpython/Lib/runpy.py", line 198 in _run_module_as_main

Extension modules: _testcapi, _xxsubinterpreters, _testinternalcapi, _testbuffer, _testmultiphase, _ctypes_test, xxsubtype (total: 7)

Linked PRs

vstinner commented 1 year ago

How did you configure Python? (the ./configure command)

vstinner commented 1 year ago

Please run the test with:

./python -X dev -m test test_imp -v
LeamHall commented 1 year ago
./python -X dev -m test test_imp -v
== CPython 3.12.0a2+ (heads/main:4f5e1cb00a, Nov 18 2022, 07:36:12) [GCC 10.2.1 20201203]
== Linux-5.18.19_1-x86_64-with-glibc2.32 little-endian
== cwd: /usr/local/src/cpython/build/test_python_25939æ
== CPU count: 2
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 0.07 Run tests sequentially
0:00:00 load avg: 0.07 [1/1] test_imp
test_bug7732 (test.test_imp.ImportTests.test_bug7732) ... ok
test_find_and_load_checked_pyc (test.test_imp.ImportTests.test_find_and_load_checked_pyc) ... ok
test_find_module_encoding (test.test_imp.ImportTests.test_find_module_encoding) ... ok
test_issue1267 (test.test_imp.ImportTests.test_issue1267) ... ok
test_issue15828_load_extensions (test.test_imp.ImportTests.test_issue15828_load_extensions) ... ok
test_issue16421_multiple_modules_in_one_dll (test.test_imp.ImportTests.test_issue16421_multiple_modules_in_one_dll) ... ok
test_issue24748_load_module_skips_sys_modules_check (test.test_imp.ImportTests.test_issue24748_load_module_skips_sys_modules_check) ... ok
test_issue31315 (test.test_imp.ImportTests.test_issue31315) ... ok
test_issue3594 (test.test_imp.ImportTests.test_issue3594) ... ok
test_issue5604 (test.test_imp.ImportTests.test_issue5604) ... ok
test_issue9319 (test.test_imp.ImportTests.test_issue9319) ... ok
test_issue98354 (test.test_imp.ImportTests.test_issue98354) ... ok
test_issue_35321 (test.test_imp.ImportTests.test_issue_35321) ... ok
test_load_dynamic_ImportError_path (test.test_imp.ImportTests.test_load_dynamic_ImportError_path) ... ok
test_load_from_source (test.test_imp.ImportTests.test_load_from_source) ... ok
test_load_module_extension_file_is_None (test.test_imp.ImportTests.test_load_module_extension_file_is_None) ... ok
test_load_source (test.test_imp.ImportTests.test_load_source) ... ok
test_multiple_calls_to_get_data (test.test_imp.ImportTests.test_multiple_calls_to_get_data) ... ok
test_pyc_invalidation_mode_from_cmdline (test.test_imp.ImportTests.test_pyc_invalidation_mode_from_cmdline) ... ok
test_source_hash (test.test_imp.ImportTests.test_source_hash) ... ok
testLock (test.test_imp.LockTests.testLock) ... ok
test_unencodeable (test.test_imp.NullImporterTests.test_unencodeable) ... ok
test_cache_from_source (test.test_imp.PEP3147Tests.test_cache_from_source) ... ok
test_source_from_cache (test.test_imp.PEP3147Tests.test_source_from_cache) ... ok
test_builtin (test.test_imp.ReloadTests.test_builtin) ... ok
test_extension (test.test_imp.ReloadTests.test_extension) ... ok
test_source (test.test_imp.ReloadTests.test_source) ... ok
test_with_deleted_parent (test.test_imp.ReloadTests.test_with_deleted_parent) ... ok

----------------------------------------------------------------------
Ran 28 tests in 0.220s

OK

== Tests result: SUCCESS ==

1 test OK.

Total duration: 313 ms
Tests result: SUCCESS
LeamHall commented 1 year ago

How did you configure Python? (the ./configure command)

./configure --with-pydebug

chgnrdv commented 1 year ago

@vstinner may be related: https://github.com/python/cpython/pull/98412#issuecomment-1301020123 (I wonder why I haven't created issue back then)

vstinner commented 1 year ago

Can you please try to bisect the issue?

./python -m test --list-tests > tests.txt
# edit tests.txt to remove a bunch of tests

Then run the sub-set of tests:

./python -m test --fromfile=tests.txt

Until you reproduce the bug, continue modifying tests.txt. You can keep copies of this file, like tests.txt, tests2.txt, etc. Each time, you remove a bunch of tests.

You might also try to run tests in parallel, it spawns one fresh process per test file:

./python -m test -j0
vstinner commented 1 year ago

Oh, I can reproduce the crash just with:

$ ./python -m test --match=test.test_imp.ImportTests.test_issue98354 --match=test.test_capi.test_misc.SubinterpreterTest.test_module_state_shared_in_global test_capi test_imp
0:00:00 load avg: 0.49 Run tests sequentially
0:00:00 load avg: 0.49 [1/2] test_capi
0:00:00 load avg: 0.49 [2/2] test_imp
Modules/gcmodule.c:113: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
Enable tracemalloc to get the memory block allocation traceback

object address  : 0x7ff56d2f77d0
object refcount : 54
object type     : 0x93a0c0
object type name: module
object repr     : <module 'builtins' (built-in)>

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: initialized

Current thread 0x00007ff56d36f740 (most recent call first):
  Garbage-collecting
  File "/home/vstinner/python/main/Lib/test/support/__init__.py", line 738 in gc_collect
  File "/home/vstinner/python/main/Lib/test/libregrtest/save_env.py", line 314 in __exit__
  File "/home/vstinner/python/main/Lib/test/libregrtest/runtest.py", line 312 in _runtest_inner2
(...)
Extension modules: _testcapi, _testmultiphase, _testinternalcapi (total: 3)
Abandon (core dumped)
LeamHall commented 1 year ago

yeah, but I'm making notes on what you suggested, for next time. ;)

LeamHall commented 1 year ago

@vstinner may be related: #98412 (comment) (I wonder why I haven't created issue back then)

You didn't create the issue so a newbie would have a chance. How's that for a good reason? ;)

vstinner commented 1 year ago

Regression introduced in 2019 by commit 2582d46fbcf7bdf86b9cf4016850b8d155267ac6:

commit 2582d46fbcf7bdf86b9cf4016850b8d155267ac6
Author: Victor Stinner <vstinner@python.org>
Date:   Fri Nov 22 19:24:49 2019 +0100

    bpo-38858: new_interpreter() reuses pycore_init_builtins() (GH-17351)

    new_interpreter() now calls _PyBuiltin_Init() to create the builtins
    module and calls _PyImport_FixupBuiltin(), rather than using
    _PyImport_FindBuiltin(tstate, "builtins").

    pycore_init_builtins() is now responsible to initialize
    intepr->builtins_copy: inline _PyImport_Init() and remove this
    function.

I used test_bug.py, test_bug2.py and a patch to bisect the issue with git bisect. Command for the test:

git checkout . && /bin/cp -f ../test_bug*.py Lib/test/ && git apply ../old-patch && make clean && make && ./python -m test test_bug test_bug2 -v

test_bug.py:

import unittest
from test import support

class SubinterpreterTest(unittest.TestCase):
    def test_module_state_shared_in_global(self):
        ret = support.run_in_subinterp("pass")
        self.assertEqual(ret, 0)

test_bug2.py:

import unittest
import _imp
from test import support

class SubinterpreterTest(unittest.TestCase):
    def test_issue98354(self):
        # _imp.create_builtin should raise TypeError
        # if 'name' attribute of 'spec' argument is not a 'str' instance

        create_builtin = support.get_attribute(_imp, "create_builtin")

        class FakeSpec:
            def __init__(self, name):
                self.name = self
        spec = FakeSpec("time")
        with self.assertRaises(TypeError):
            create_builtin(spec)

        class FakeSpec2:
            name = [1, 2, 3, 4]
        spec = FakeSpec2()
        with self.assertRaises(TypeError):
            create_builtin(spec)

        import builtins
        class UnicodeSubclass(str):
            pass
        class GoodSpec:
            name = UnicodeSubclass("builtins")
        spec = GoodSpec()
        bltin = create_builtin(spec)
        self.assertEqual(bltin, builtins)

        class UnicodeSubclassFakeSpec(str):
            def __init__(self, name):
                self.name = self
        spec = UnicodeSubclassFakeSpec("builtins")
        bltin = create_builtin(spec)
        self.assertEqual(bltin, builtins)

patch:

diff --git a/Python/import.c b/Python/import.c
index 505688400e..8f6599cad4 100644
--- a/Python/import.c
+++ b/Python/import.c
@@ -1263,6 +1263,14 @@ _imp_create_builtin(PyObject *module, PyObject *spec)
         return NULL;
     }

+    if (!PyUnicode_Check(name)) {
+        PyErr_Format(PyExc_TypeError,
+                     "name must be string, not %.200s",
+                     Py_TYPE(name)->tp_name);
+        Py_DECREF(name);
+        return NULL;
+    }
+
     mod = _PyImport_FindExtensionObject(name, name);
     if (mod || _PyErr_Occurred(tstate)) {
         Py_DECREF(name);

I wrote different flavors of the patch depending on the Python version.

vstinner commented 1 year ago

I love git bisect. It prevented me to have to manually go through 9422 commits: November 20, 2019 to November 19, 2022, over 3 years of commits!

LeamHall commented 1 year ago

On 11/19/22 10:47, Victor Stinner wrote:

I love |git bisect|. It prevented me to have to manually go through 9422 commits: November 20, 2019 to November 19, 2022, over 3 years of commits!

Dude, I'm going to have to study the other email you sent, that's awesome!

Leam

-- Automation Engineer (reuel.net/resume) Scribe: The Domici War (domiciwar.net) General Ne'er-do-well (github.com/LeamHall)

vstinner commented 1 year ago

Simpler reproducer:

# CRASH: https://github.com/python/cpython/issues/99578

import _imp
import _testcapi
import builtins
import gc

ret = _testcapi.run_in_subinterp("pass")
if ret != 0:
    raise Exception(1)
gc.collect()

class Spec:
    name = "builtins"
spec = Spec()
mod = _imp.create_builtin(spec)
if mod is not builtins:
    raise Exception(2)

print("GC collect")
gc.collect()   # <===== CRASH OCCURS HERE
mod = None

print("exit")
vstinner commented 1 year ago
        /* Module does not support repeated initialization */
        if (def->m_base.m_copy == NULL)
            return NULL;
    PyObject *mod = import_add_module(tstate, name);
    if (mod) {
        PyObject *ref = PyWeakref_NewRef(mod, NULL);
        Py_DECREF(mod);
        if (ref == NULL) {
            return NULL;
        }
        mod = PyWeakref_GetObject(ref);
        Py_DECREF(ref);
    }
    return mod; /* borrowed reference */

Why not using Py_DECREF()?

vstinner commented 1 year ago

If _testcapi.run_in_subinterp("pass") is not called, def->m_base.m_copy is not NULL and create_builtin() takes a different code path.

vstinner commented 1 year ago

I wrote PR #99642 which adds a test to reproduce the crash, and a fix for the crash.


History of this issue:

This bug was well hidden since running test_imp alone was not enough: a previous test must create a sub-intepreter (ex: test_capi).

To get the bug, you need multiple conditions:

In fact, this crash is not a regression. _imp.create_builtin() had a reference bug on PyImport_AddModule() code path since the implementation of PEP 489 in Python 3.5 (2015): commit d5cacbb1d9c3edc02bf0ba01702e7c06da5bc318.

commit d5cacbb1d9c3edc02bf0ba01702e7c06da5bc318
Author: Nick Coghlan <ncoghlan@gmail.com>
Date:   Sat May 23 22:24:10 2015 +1000

    PEP 489: Multi-phase extension module initialization

    Known limitations of the current implementation:

    - documentation changes are incomplete
    - there's a reference leak I haven't tracked down yet

    The leak is most visible by running:

      ./python -m test -R3:3 test_importlib

    However, you can also see it by running:

      ./python -X showrefcount

    Importing the array or _testmultiphase modules, and
    then deleting them from both sys.modules and the local
    namespace shows significant increases in the total
    number of active references each cycle. By contrast,
    with _testcapi (which continues to use single-phase
    initialisation) the global refcounts stabilise after
    a couple of cycles.

But this reference bug cannot be seen before my change in 2019: commit 2582d46fbcf7bdf86b9cf4016850b8d155267ac6.

vstinner commented 1 year ago

I can reproduce the crash on Python 3.11 without the fix using:

./python -m test test_capi test_imp -v 
vstinner commented 1 year ago

The crash was fixed in 3.10, 3.11 and main branches. Fix in main: https://github.com/python/cpython/commit/cb2ef8b2acbb231c207207d3375b2f8b0077a6ee

Thanks for the bug report @LeamHall. Please check that ./python -m test no long crash for you. But then, I suggest to adding -j0 to run tests in parallel.

vstinner commented 1 year ago

At least, on main, I confirm that ./python -m test works on my Linux machine, but it takes 1 hour!

== Tests result: SUCCESS ==

411 tests OK.

22 tests skipped:
    test_check_c_globals test_curses test_devpoll test_kqueue
    test_launcher test_msilib test_ossaudiodev test_peg_generator
    test_smtpnet test_socketserver test_startfile test_tix
    test_tkinter test_ttk test_urllib2net test_urllibnet
    test_winconsoleio test_winreg test_winsound test_wmi
    test_xmlrpc_net test_zipfile64

Total duration: 1 hour 34 sec
Tests result: SUCCESS

Using -j0, it just takes 7 min 12 sec using "14 child processes" (my laptop CPU has 12 threads) ;-)

LeamHall commented 1 year ago

On 11/21/22 07:54, Victor Stinner wrote:

The crash was fixed in 3.10, 3.11 and main branches. Fix in main: cb2ef8b https://github.com/python/cpython/commit/cb2ef8b2acbb231c207207d3375b2f8b0077a6ee

Thanks for the bug report @LeamHall https://github.com/LeamHall. Please check that |./python -m test| no long crash for you. But then, I suggest to adding |-j0| to run tests in parallel.

Running tests (up to 317 of 433), mine take a while. The test_imp did pass though, so I think your work fixed things.

Leam

-- Automation Engineer (reuel.net/resume) Scribe: The Domici War (domiciwar.net) General Ne'er-do-well (github.com/LeamHall)

ericsnowcurrently commented 1 year ago

Thanks for the great detective work, @vstinner!