python / cpython

The Python programming language
https://www.python.org
Other
62.42k stars 29.97k forks source link

"Impossible" KeyError from importlib._bootstrap acquire line 110 #87712

Open e9db6c33-c718-471d-b5a3-e43677487d8e opened 3 years ago

e9db6c33-c718-471d-b5a3-e43677487d8e commented 3 years ago
BPO 43546
Nosy @brettcannon, @ncoghlan, @pitrou, @ericsnowcurrently, @serhiy-storchaka, @anentropic

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['3.9', 'type-crash'] title = '"Impossible" KeyError from importlib._bootstrap acquire line 110' updated_at = user = 'https://github.com/anentropic' ``` bugs.python.org fields: ```python activity = actor = 'iritkatriel' assignee = 'none' closed = True closed_date = closer = 'iritkatriel' components = [] creation = creator = 'anentropic' dependencies = [] files = [] hgrepos = [] issue_num = 43546 keywords = [] message_count = 7.0 messages = ['389034', '389035', '389036', '389037', '389038', '389067', '389085'] nosy_count = 6.0 nosy_names = ['brett.cannon', 'ncoghlan', 'pitrou', 'eric.snow', 'serhiy.storchaka', 'anentropic'] pr_nums = [] priority = 'normal' resolution = 'third party' stage = 'resolved' status = 'closed' superseder = None type = 'crash' url = 'https://bugs.python.org/issue43546' versions = ['Python 3.9'] ```

e9db6c33-c718-471d-b5a3-e43677487d8e commented 3 years ago

We have a Django 2.2.19 project on Python 3.9.2 on Debian (slim-buster) in Docker.

A bizarre problem started happening to us this week.

First I'll show the symptom, we started getting the following error:

... File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/migrations/autodetector.py", line 10, in \<module> from django.db.migrations.optimizer import MigrationOptimizer File "\<frozen importlib._bootstrap>", line 1004, in _find_andload File "\<frozen importlib._bootstrap>", line 158, in \_enter__ File "\<frozen importlib._bootstrap>", line 110, in acquire KeyError: 140426340123264

If I look at the source for _bootstrap.py, that error should be impossible: https://github.com/python/cpython/blob/v3.9.2/Lib/importlib/_bootstrap.py#L110

At the top of the acquire method it does:

    tid = _thread.get_ident()
    _blocking_on[tid] = self

and then on line 110 where we get the KeyError:

del _blocking_on[tid]

both tid and _blocking_on are local vars and none of the other lines in the method touch them

So how do we get a KeyError?

I can only think that something mutates the underlying value of tid, but it's supposed to be an int so that's very weird.

I started with the symptom because our context for this is complicated to explain. I did find a fix that prevents the error but I do not understand the link between cause and effect.

Our context:

We currently split tests across 3 nodes, and it was always node 2 which would fail. I found that commenting out a test case in any of the files being passed to node 2 would prevent the error from occurring. Note that in this case we are still passing *exactly the same filenames* as cli args to the test runner.

Splitting the tests across 4 nodes instead of 3 also seems to prevent the error. So it seems like, in some way I don't understand, we just have too many test cases. Perhaps nose is doing something wrong or inefficient when given lots of filenames.

But I'm reporting here because the error we get from importlib._bootstrap looks like it should be impossible.

e9db6c33-c718-471d-b5a3-e43677487d8e commented 3 years ago

FWIW...

django-nose==1.4.6 nose==1.3.7

The only use of threading I can see in Nose is here: https://github.com/nose-devs/nose/blob/release_1.3.7/nose/plugins/logcapture.py

But we are passing --nologcapture flag and nosetests echoes that flag back to us at the top of the test run, so I believe the feature is disabled

e9db6c33-c718-471d-b5a3-e43677487d8e commented 3 years ago

https://docs.python.org/3/library/_thread.html#_thread.get_ident

Thread identifiers may be recycled when a thread exits and another thread is created.

I still don't see how we get that KeyError but it feels like it might be something to do with this

e9db6c33-c718-471d-b5a3-e43677487d8e commented 3 years ago

It turns out splitting the tests across 2 nodes rather than 4 also prevents the error.

So it's not a case of "too many" tests.

So then I think maybe it needs to be a specific number to trigger the error?

And yes, if I duplicate one of our test cases (I mean test method in a TestCase class) so that there is one more than before... that also prevents the error!

serhiy-storchaka commented 3 years ago

It looks like method _ModuleLock.acquire() was re-entered in the same thread.

e9db6c33-c718-471d-b5a3-e43677487d8e commented 3 years ago

Thank you for your explanation.

I am baffled why this has never happened to us before, and why a specific number of test cases should trigger it (comment one out or add an extra one, it goes away).

I've pasted our full stack trace below. It always fails in this place.

My original thought when getting this error was to check for code that executes at import time. The module being imported when it breaks seems totally innocuous though: https://github.com/django/django/blob/2.2.19/django/db/migrations/optimizer.py

There are two instances of:

File "\<frozen importlib._bootstrap>", line 1007, in _find_and_load File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked return exec_and_call_hooks(module_name, wrapped, args, kwargs)

...in the traceback, so https://github.com/DataDog/dd-trace-py starts to look suspicious.

Traceback (most recent call last):
  File "/work/manage.py", line 26, in <module>
    execute_from_command_line(sys.argv)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py", line 23, in run_from_argv
    super().run_from_argv(argv)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py", line 53, in handle
    failures = test_runner.run_tests(test_labels)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 308, in run_tests
    result = self.run_suite(nose_argv)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 244, in run_suite
    nose.core.TestProgram(argv=nose_argv, exit=False,
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 118, in __init__
    unittest.TestProgram.__init__(
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/unittest/main.py", line 101, in __init__
    self.runTests()
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 207, in runTests
    result = self.testRunner.run(self.test)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 50, in run
    wrapper = self.config.plugins.prepareTest(test)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py", line 99, in __call__
    return self.call(*arg, **kw)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py", line 167, in simple
    result = meth(*arg, **kw)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/plugin.py", line 82, in prepareTest
    self.old_names = self.runner.setup_databases()
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 495, in setup_databases
    return super(NoseTestSuiteRunner, self).setup_databases()
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/runner.py", line 552, in setup_databases
    return _setup_databases(
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/utils.py", line 170, in setup_databases
    connection.creation.create_test_db(
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/backends/base/creation.py", line 67, in create_test_db
    call_command(
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 110, in call_command
    command = load_command_class(app_name, command_name)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 36, in load_command_class
    module = import_module('%s.management.commands.%s' % (app_name, name))
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked
    return exec_and_call_hooks(module_name, wrapped, args, kwargs)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 171, in exec_and_call_hooks
    return wrapped(*args, **kwargs)
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 790, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/migrate.py", line 14, in <module>
    from django.db.migrations.autodetector import MigrationAutodetector
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked
    return exec_and_call_hooks(module_name, wrapped, args, kwargs)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 171, in exec_and_call_hooks
    return wrapped(*args, **kwargs)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/migrations/autodetector.py", line 10, in <module>
    from django.db.migrations.optimizer import MigrationOptimizer
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load
  File "<frozen importlib._bootstrap>", line 158, in __enter__
  File "<frozen importlib._bootstrap>", line 110, in acquire
KeyError: 140581625389696
e9db6c33-c718-471d-b5a3-e43677487d8e commented 3 years ago

Upgrading ddtrace lib to latest version did not help.

Disabling ddtrace patching of django module does make the error go away.

Thanks again for your help, I will move my bug report over to ddtrace.

exarkun commented 2 years ago

This is the same as https://github.com/python/cpython/issues/91351