Open e9db6c33-c718-471d-b5a3-e43677487d8e opened 3 years ago
We have a Django 2.2.19 project on Python 3.9.2 on Debian (slim-buster) in Docker.
A bizarre problem started happening to us this week.
First I'll show the symptom, we started getting the following error:
... File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/migrations/autodetector.py", line 10, in \<module> from django.db.migrations.optimizer import MigrationOptimizer File "\<frozen importlib._bootstrap>", line 1004, in _find_andload File "\<frozen importlib._bootstrap>", line 158, in \_enter__ File "\<frozen importlib._bootstrap>", line 110, in acquire KeyError: 140426340123264
If I look at the source for _bootstrap.py, that error should be impossible: https://github.com/python/cpython/blob/v3.9.2/Lib/importlib/_bootstrap.py#L110
At the top of the acquire method it does:
tid = _thread.get_ident()
_blocking_on[tid] = self
and then on line 110 where we get the KeyError:
del _blocking_on[tid]
both tid
and _blocking_on
are local vars and none of the other lines in the method touch them
So how do we get a KeyError?
I can only think that something mutates the underlying value of tid
, but it's supposed to be an int so that's very weird.
I started with the symptom because our context for this is complicated to explain. I did find a fix that prevents the error but I do not understand the link between cause and effect.
Our context:
We currently split tests across 3 nodes, and it was always node 2 which would fail. I found that commenting out a test case in any of the files being passed to node 2 would prevent the error from occurring. Note that in this case we are still passing *exactly the same filenames* as cli args to the test runner.
Splitting the tests across 4 nodes instead of 3 also seems to prevent the error. So it seems like, in some way I don't understand, we just have too many test cases. Perhaps nose is doing something wrong or inefficient when given lots of filenames.
But I'm reporting here because the error we get from importlib._bootstrap looks like it should be impossible.
FWIW...
django-nose==1.4.6 nose==1.3.7
The only use of threading I can see in Nose is here: https://github.com/nose-devs/nose/blob/release_1.3.7/nose/plugins/logcapture.py
But we are passing --nologcapture flag and nosetests echoes that flag back to us at the top of the test run, so I believe the feature is disabled
https://docs.python.org/3/library/_thread.html#_thread.get_ident
Thread identifiers may be recycled when a thread exits and another thread is created.
I still don't see how we get that KeyError but it feels like it might be something to do with this
It turns out splitting the tests across 2 nodes rather than 4 also prevents the error.
So it's not a case of "too many" tests.
So then I think maybe it needs to be a specific number to trigger the error?
And yes, if I duplicate one of our test cases (I mean test method in a TestCase class) so that there is one more than before... that also prevents the error!
It looks like method _ModuleLock.acquire() was re-entered in the same thread.
Thank you for your explanation.
I am baffled why this has never happened to us before, and why a specific number of test cases should trigger it (comment one out or add an extra one, it goes away).
I've pasted our full stack trace below. It always fails in this place.
My original thought when getting this error was to check for code that executes at import time. The module being imported when it breaks seems totally innocuous though: https://github.com/django/django/blob/2.2.19/django/db/migrations/optimizer.py
There are two instances of:
File "\<frozen importlib._bootstrap>", line 1007, in _find_and_load File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked return exec_and_call_hooks(module_name, wrapped, args, kwargs)
...in the traceback, so https://github.com/DataDog/dd-trace-py starts to look suspicious.
Traceback (most recent call last):
File "/work/manage.py", line 26, in <module>
execute_from_command_line(sys.argv)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py", line 23, in run_from_argv
super().run_from_argv(argv)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py", line 53, in handle
failures = test_runner.run_tests(test_labels)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 308, in run_tests
result = self.run_suite(nose_argv)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 244, in run_suite
nose.core.TestProgram(argv=nose_argv, exit=False,
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 118, in __init__
unittest.TestProgram.__init__(
File "/root/.pyenv/versions/3.9.2/lib/python3.9/unittest/main.py", line 101, in __init__
self.runTests()
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 207, in runTests
result = self.testRunner.run(self.test)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 50, in run
wrapper = self.config.plugins.prepareTest(test)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py", line 99, in __call__
return self.call(*arg, **kw)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py", line 167, in simple
result = meth(*arg, **kw)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/plugin.py", line 82, in prepareTest
self.old_names = self.runner.setup_databases()
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 495, in setup_databases
return super(NoseTestSuiteRunner, self).setup_databases()
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/runner.py", line 552, in setup_databases
return _setup_databases(
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/utils.py", line 170, in setup_databases
connection.creation.create_test_db(
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/backends/base/creation.py", line 67, in create_test_db
call_command(
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 110, in call_command
command = load_command_class(app_name, command_name)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 36, in load_command_class
module = import_module('%s.management.commands.%s' % (app_name, name))
File "/root/.pyenv/versions/3.9.2/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked
return exec_and_call_hooks(module_name, wrapped, args, kwargs)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 171, in exec_and_call_hooks
return wrapped(*args, **kwargs)
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 790, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/migrate.py", line 14, in <module>
from django.db.migrations.autodetector import MigrationAutodetector
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked
return exec_and_call_hooks(module_name, wrapped, args, kwargs)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 171, in exec_and_call_hooks
return wrapped(*args, **kwargs)
File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/migrations/autodetector.py", line 10, in <module>
from django.db.migrations.optimizer import MigrationOptimizer
File "<frozen importlib._bootstrap>", line 1004, in _find_and_load
File "<frozen importlib._bootstrap>", line 158, in __enter__
File "<frozen importlib._bootstrap>", line 110, in acquire
KeyError: 140581625389696
Upgrading ddtrace lib to latest version did not help.
Disabling ddtrace patching of django module does make the error go away.
Thanks again for your help, I will move my bug report over to ddtrace.
This is the same as https://github.com/python/cpython/issues/91351
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['3.9', 'type-crash']
title = '"Impossible" KeyError from importlib._bootstrap acquire line 110'
updated_at =
user = 'https://github.com/anentropic'
```
bugs.python.org fields:
```python
activity =
actor = 'iritkatriel'
assignee = 'none'
closed = True
closed_date =
closer = 'iritkatriel'
components = []
creation =
creator = 'anentropic'
dependencies = []
files = []
hgrepos = []
issue_num = 43546
keywords = []
message_count = 7.0
messages = ['389034', '389035', '389036', '389037', '389038', '389067', '389085']
nosy_count = 6.0
nosy_names = ['brett.cannon', 'ncoghlan', 'pitrou', 'eric.snow', 'serhiy.storchaka', 'anentropic']
pr_nums = []
priority = 'normal'
resolution = 'third party'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue43546'
versions = ['Python 3.9']
```