Closed tunetheweb closed 2 years ago
We've experienced a similar issue with CI timeouts in CircleCI when using Python 3.8. This occurs when using pytest as our test runner with coverage. The hang occurs after tests pass and the coverage is reported to the console; it's almost like the python process fails to exit.
(venv) root@af00fef01e49:/home/circleci/repo# python -V
Python 3.8.12
(venv) root@af00fef01e49:/home/circleci/repo# coverage debug sys
-- sys -------------------------------------------------------
coverage_version: 6.3
coverage_module: /home/circleci/repo/venv/lib/python3.8/site-packages/coverage/__init__.py
tracer: -none-
CTracer: available
plugins.file_tracers: -none-
plugins.configurers: -none-
plugins.context_switchers: -none-
configs_attempted: .coveragerc
configs_read: /home/circleci/repo/.coveragerc
config_file: /home/circleci/repo/.coveragerc
config_contents: b'[report]\nomit =\n */synapse/tests/test_*\n'
data_file: -none-
python: 3.8.12 (default, Aug 31 2021, 16:30:39) [GCC 9.3.0]
platform: Linux-5.11.0-1022-aws-x86_64-with-glibc2.29
implementation: CPython
executable: /home/circleci/repo/venv/bin/python3
def_encoding: utf-8
fs_encoding: utf-8
pid: 322
cwd: /home/circleci/repo
path: /home/circleci/repo/venv/bin
/home/circleci/.pyenv/versions/3.8.12/lib/python38.zip
/home/circleci/.pyenv/versions/3.8.12/lib/python3.8
/home/circleci/.pyenv/versions/3.8.12/lib/python3.8/lib-dynload
/home/circleci/repo/venv/lib/python3.8/site-packages
/home/circleci/repo
environment: HOME = /root
command_line: /home/circleci/repo/venv/bin/coverage debug sys
sqlite3_version: 2.6.0
sqlite3_sqlite_version: 3.31.1
sqlite3_temp_store: 0
sqlite3_compile_options: COMPILER=gcc-9.3.0; ENABLE_COLUMN_METADATA; ENABLE_DBSTAT_VTAB
ENABLE_FTS3; ENABLE_FTS3_PARENTHESIS; ENABLE_FTS3_TOKENIZER
ENABLE_FTS4; ENABLE_FTS5; ENABLE_JSON1
ENABLE_LOAD_EXTENSION; ENABLE_PREUPDATE_HOOK; ENABLE_RTREE
ENABLE_SESSION; ENABLE_STMTVTAB; ENABLE_UNLOCK_NOTIFY
ENABLE_UPDATE_DELETE_LIMIT; HAVE_ISNAN; LIKE_DOESNT_MATCH_BLOBS
MAX_SCHEMA_RETRY=25; MAX_VARIABLE_NUMBER=250000; OMIT_LOOKASIDE
SECURE_DELETE; SOUNDEX; TEMP_STORE=1
THREADSAFE=1; USE_URI
Pip freeze output
(venv) root@af00fef01e49:/home/circleci/repo# pip freeze
aiohttp==3.8.1
aiohttp-socks==0.6.1
aioimaplib==0.9.0
aiosignal==1.2.0
aiosmtplib==1.1.6
antlr4-python3-runtime==4.8
appdirs==1.4.4
arrow==1.2.1
async-timeout==4.0.2
attrs==21.4.0
base58==2.1.1
bech32==1.2.0
cattrs==1.10.0
cbor2==5.4.2
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.10
colorama==0.4.4
coverage==6.3
cpe==1.2.1
cryptography==36.0.1
fastjsonschema==2.14.5
fqdn==1.5.1
frozenlist==1.3.0
idna==3.3
importlib-resources==5.4.0
iniconfig==1.1.1
isoduration==20.11.0
jsonpointer==2.2
jsonschema==4.4.0
lark-parser==0.11.2
lmdb==1.3.0
more-itertools==8.12.0
msgpack==1.0.3
multidict==6.0.2
oauthlib==3.1.1
packaging==20.9
pluggy==1.0.0
prompt-toolkit==3.0.24
py==1.11.0
pycodestyle==2.8.0
pycparser==2.21
pycryptodome==3.12.0
Pygments==2.7.4
pyOpenSSL==21.0.0
pyparsing==3.0.7
pyrsistent==0.18.1
pytest==6.2.5
pytest-cov==3.0.0
python-bitcoinlib==0.11.0
python-dateutil==2.8.2
python-socks==1.2.4
PyYAML==6.0
regex==2022.1.18
requests==2.27.1
requests-cache==0.9.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
scalecodec==1.0.2
simplejson==3.17.6
six==1.16.0
stix2-patterns==1.3.2
stix2-validator==3.0.2
-e git+ssh://git@github.com/vertexproject/synapse.git@c8568da02165fe0c8131b8ffcc1e48e20dc2ad07#egg=synapse
toml==0.10.2
tomli==2.0.0
tornado==6.1
typing_extensions==4.0.1
uri-template==1.1.0
url-normalize==1.4.3
urllib3==1.26.8
vcrpy==4.1.1
wcwidth==0.2.5
webcolors==1.11.1
wrapt==1.13.3
xxhash==2.0.2
yarl==1.7.2
zipp==3.7.0
This happened to our master commit here https://github.com/vertexproject/synapse/commit/1b2d327a32e33b443530f2d8699ad74d1cf8f444
This was fixed in https://github.com/vertexproject/synapse/pull/2534/files#diff-c9c796259f3852b51b531b79cbf07820145088d4f108fb006ab44b15f0b15934R8
We did find the following line in a CI log for one of the failing tests which may or may not be related - it occurred prior to the pytest warnings summary and the console coverage output.
/home/circleci/repo/venv/lib/python3.8/site-packages/coverage/data.py:129:
CoverageWarning: Data file '/home/circleci/repo/.coverage.803d4f93f49b.429.890315' doesn't seem to be a coverage data file: cannot unpack non-iterable NoneType object
Interesting that you saw this on 3.8. We experienced some intermittent hanging in other Python versions but repeated hanging (every time) in 3.7, so wasnβt sure if the intermittent issues were related.
I also forgot to include my error message. It happened near the end of the test, but not quite at the end as you experienced (not sqlfluff is our project, but the last line might be useful):
[gw1] [ 90%] PASSED test/rules/yaml_test_cases_test.py::test__rule_test_global_config /home/runner/work/sqlfluff/sqlfluff/.tox/py37/lib/python3.7/site-packages/coverage/inorout.py:535: CoverageWarning: Module sqlfluff was previously imported, but not measured (module-not-measured)
5583
self.warn(msg, slug="module-not-measured")
I'm not clear what problems you are reporting. @tunetheweb the original issue description doesn't mention what the actual problem is. @vEpiphyte says they've seen similar problems, and mentions hanging. Then @tunetheweb shows a module-not-measured warning. This sounds like two different issues.
Anything you can do to give me a reproducible scenario would be great.
Yes I appreciate this is a very poor issue to raise. Sorry about that, but thought I'd give you a heads up as saw it right after the release, and it stopped happening right after I pinned to the old version.
Will try to give a more meaningful reproducible use case if I can narrow it down.
The issue is jobs hanging (I cancelled them after an hour, when they normally take 5 mins. One job I left running and it took 3 hours and counting). That was for Python 3.7. I also saw a lot of slow down in jobs (5 min jobs were more often than not taking 20mins when they did complete). That was for Python 3.8+.
Initially I thought GitHub Actions were on the blink but at soon as I pinned the old version of coverage it all worked. Unpinning it again breaks it again. Nothing else changes between the runs.
But as to narrowing it down why, at the moment I don't have more info to help explain that. But will keep digging...
Think the error message is a red-herring. It was the last thing printed before it froze. But do see that at the end of the run elsewhere (though interestingly I only see it at the end of good runs, whereas that appeared after our CI job was only at 90% and then hung).
Sorry, again for such a poor report. Can close it if you want until I can get you more info, as appreciate it's difficult to do anything with what I've written here but as I say this was more a heads up, and a vague hope someone would have an idea what it might be π
@tunetheweb @vEpiphyte @sir-sigurd Can you try installing this branch of coverage.py to see if it fixes the problem?
pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1312
Doesn't look any better with that to be honest.
BTW happy to move this discussion to #1312 as that issue would appear to be a duplicate of this one, but with much better detail!
Yeah definitely seeing same issue with that unfortunately :-(
But it is very reproducible so, while I haven't been able to help with a minimal reproducible test case, I'm very happy to test any other test branches you have. Just takes about 10mins to run.
@tunetheweb @vEpiphyte @sir-sigurd Can you try installing this branch of coverage.py to see if it fixes the problem?
pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1312
Nope, it doesn't help.
I did reproduce the error locally with a set of our unit tests. Recording this big blob for posterity - I haven't had the time to whittle it down yet but will do shortly.
$ python -m pytest -v -s -rs --durations 6 --maxfail 6 -p no:logging --junitxml=test-reports/junit.xml --cov synapse --no-cov-on-fail synapse/tests/test_cmds_trigger.py synapse/tests/test_lib_cell.py synapse/tests/test_lib_crypto_ecc.py synapse/tests/test_lib_hashset.py synapse/tests/test_lib_layer.py synapse/tests/test_lib_node.py synapse/tests/test_lib_scope.py synapse/tests/test_lib_stormhttp.py synapse/tests/test_lib_stormlib_model.py synapse/tests/test_lib_stormlib_yaml.py synapse/tests/test_lib_trigger.py synapse/tests/test_model_auth.py synapse/tests/test_model_geospace.py synapse/tests/test_model_media.py synapse/tests/test_servers_axon.py synapse/tests/test_tools_backup.py synapse/tests/test_tools_guid.py synapse/tests/test_tools_storm.py
This produces the same hang locally using Python 3.8.10 on linux (ubuntu 21.04) and coverage 6.3.0.
I then installed ned's patch with the following
pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1312
And got the same hang. Sending sigint (ctrl+c) to the process yielded the following traceback
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
Process SpawnProcess-22:
Process SpawnProcess-18:
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1096, in execute
return self.con.execute(sql, parameters)
Traceback (most recent call last):
sqlite3.OperationalError: no such table: coverage_schema
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1101, in execute
return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: coverage_schema
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
util._exit_function()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 270, in _read_db
schema_version, = db.execute_one("select version from coverage_schema")
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 334, in _exit_function
_run_finalizers(0)
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
finalizer()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1129, in execute_one
rows = list(self.execute(sql, parameters))
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 224, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 99, in cleanup
_cleanup(_active_cov)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1118, in execute
raise DataError(f"Couldn't use data file {self.filename!r}: {msg}") from exc
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 84, in _cleanup
cov.stop()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 700, in save
data = self.get_data()
coverage.exceptions.DataError: Couldn't use data file '/home/user/PycharmProjects/synapse/.coverage.vertex005c.35596.883395': no such table: coverage_schema
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 769, in get_data
if self._collector and self._collector.flush_data():
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
self.covdata.add_lines(self.mapped_file_dict(self.data))
During handling of the above exception, another exception occurred:
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
return method(self, *args, **kwargs)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 466, in add_lines
file_id = self._file_id(filename, add=True)
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 388, in _file_id
cur = con.execute("insert or replace into file (path) values (?)", (filename,))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1096, in execute
return self.con.execute(sql, parameters)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 608, in _on_sigterm
self._atexit("sigterm")
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 604, in _atexit
self.save()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 700, in save
data = self.get_data()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 769, in get_data
if self._collector and self._collector.flush_data():
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
self.covdata.add_lines(self.mapped_file_dict(self.data))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 234, in _wrapped
with self._lock:
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
util._exit_function()
KeyboardInterrupt
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 334, in _exit_function
_run_finalizers(0)
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
finalizer()
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 224, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 99, in cleanup
_cleanup(_active_cov)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 84, in _cleanup
cov.stop()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 700, in save
data = self.get_data()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 769, in get_data
if self._collector and self._collector.flush_data():
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
self.covdata.add_lines(self.mapped_file_dict(self.data))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
return method(self, *args, **kwargs)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 459, in add_lines
self._choose_lines_or_arcs(lines=True)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 518, in _choose_lines_or_arcs
with self._connect() as con:
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 313, in _connect
self._open_db()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 264, in _open_db
self._read_db()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 273, in _read_db
self._init_db(db)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 299, in _init_db
db.executescript(SCHEMA)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1156, in executescript
self.con.executescript(script)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 608, in _on_sigterm
self._atexit("sigterm")
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 604, in _atexit
self.save()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 700, in save
data = self.get_data()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 769, in get_data
if self._collector and self._collector.flush_data():
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
self.covdata.add_lines(self.mapped_file_dict(self.data))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 234, in _wrapped
with self._lock:
KeyboardInterrupt
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1096, in execute
return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: context
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1101, in execute
return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: context
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
return method(self, *args, **kwargs)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 463, in add_lines
self._set_context_id()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 421, in _set_context_id
context_id = self._context_id(context)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 397, in _context_id
row = con.execute_one("select id from context where context = ?", (context,))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1129, in execute_one
rows = list(self.execute(sql, parameters))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1118, in execute
raise DataError(f"Couldn't use data file {self.filename!r}: {msg}") from exc
coverage.exceptions.DataError: Couldn't use data file '/home/user/PycharmProjects/synapse/.coverage.vertex005c.35541.378922': no such table: context
@vEpiphyte thanks for the steps. I can see the hang on my Mac. If I remove pytest-cov from the command, then the hang goes away. That is, I remove --cov synapse --no-cov-on-fail
from the command line, and change from:
$ python -m pytest ...
to:
$ python -m coverage run -m pytest ...
@tunetheweb It looks like you are also using pytest-cov. If you remove it, and run under coverage, do you still see the problem?
@nedbat Thanks for that tip about pytest-cov vs raw coverage. That worked as expected - there was no lock when I ran directly with coverage. I've also reduced the tests down to a single test and pushed that test file up to a branch here https://github.com/vertexproject/synapse/blob/5a80408d1e8f75c3ab13df48a8845da7c8ab2051/synapse/tests/test_cov_fail.py
The following command hangs like we've seen already.
python -m pytest -v -s -rs --durations 6 --maxfail 6 -p no:logging --junitxml=test-reports/junit.xml --cov synapse --no-cov-on-fail synapse/tests/test_cov_fail.py
The following command works as you demonstrated.
python -m coverage run -m pytest -v -s -rs --durations 6 --maxfail 6 -p no:logging --junitxml=test-reports/junit.xml synapse/tests/test_cov_fail.py
This particular test does have the effect of testing lots of multiprocessing targets to test a bunch of error handling and bad-path logic.
@tunetheweb It looks like you are also using pytest-cov. If you remove it, and run under coverage, do you still see the problem?
I can confirm for me the problem does not arise for me when running 6.3 outside of pytest.
Using my test file with pytest-cov and git bisect on coveragepy repo points to the problem starting to appear with this commit https://github.com/nedbat/coveragepy/commit/dd575eec0fa173bcb6c680d2f1c822a28280d011
Patching the sigterm handler
def _on_sigterm(self, signum_unused, frame_unused):
"""A handler for signal.SIGTERM."""
pid = os.getpid()
print(f'IN on_sigterm {pid=}')
self._atexit("sigterm")
# Statements after here won't be seen by metacov because we just wrote
# the data, and are about to kill the process.
signal.signal(signal.SIGTERM, self._old_sigterm) # pragma: not covered
print(f'FINI on_sigterm {pid=}')
os.kill(pid, signal.SIGTERM) # pragma: not covered
seems to stop at the signal.signal() call in the forkserver process. We never print the FINI ...
message :(
user 9428 4.7 0.2 141008 88796 pts/2 S+ 17:09 0:01 /home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=7, pipe_handle=61) --multiprocessing-fork
@vEpiphyte Thanks for the bisect and the debugging! Would you mind wrapping a try/except around that signal.signal line and seeing what's getting raised?
Also, if you could run coverage with a COVERAGE_DEBUG=process,pid
environment variable, there will be more debugging output. You might need -s
on the pytest line to get it all.
The signal.signal didn't raise an exception (or at least not one caught with a bare try/except block). Here is my output:
$ COVERAGE_DEBUG=process,pid python -m pytest -v -s -rs --durations 6 --maxfail 6 -p no:logging --junitxml=test-reports/junit.xml --cov synapse --no-cov-on-fail synapse/tests/test_cov_fail.py
10949.6f92: cwd is now '/home/user/PycharmProjects/synapse'
10949.6f92: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10949.6f92: New process: cmd: ['/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest/__main__.py', '-v', '-s', '-rs', '--durations', '6', '--maxfail', '6', '-p', 'no:logging', '--junitxml=test-reports/junit.xml', '--cov', 'synapse', '--no-cov-on-fail', 'synapse/tests/test_cov_fail.py']
10949.6f92: New process: pid: 10949, parent pid: 4785
========================================================================================= test session starts =========================================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python
cachedir: .pytest_cache
rootdir: /home/user/PycharmProjects/synapse, configfile: pytest.ini
plugins: forked-1.3.0, anyio-3.3.1, xdist-2.5.0, cov-3.0.0
collecting ... 10951.dea1: cwd is now '/home/user/PycharmProjects/synapse'
10951.dea1: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10951.dea1: New process: cmd: ['-c']
10951.dea1: New process: pid: 10951, parent pid: 10949
collected 1 item
synapse/tests/test_cov_fail.py::CovFailTest::test_cell_stream_backup 10955.ed98: cwd is now '/home/user/PycharmProjects/synapse'
10955.ed98: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10955.ed98: New process: cmd: ['-c']
10955.ed98: New process: pid: 10955, parent pid: 10949
error during task iterBackupArchive
Traceback (most recent call last):
File "/home/user/PycharmProjects/synapse/synapse/daemon.py", line 138, in t2call
async for item in valu:
File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 675, in iterBackupArchive
await self.cell.iterBackupArchive(name, user=self.user)
File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 1572, in iterBackupArchive
raise s_exc.BadArg(mesg=mesg, arg='path', valu=path)
synapse.exc.BadArg: BadArg: arg='path' mesg='Specified backup path has no cell.guid file.' valu='/tmp/tmpcxvtnpu0/backups/nope'
10962.2919: cwd is now '/home/user/PycharmProjects/synapse'
10962.2919: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10962.2919: New process: cmd: ['-c', '--multiprocessing-fork']
10962.2919: New process: pid: 10962, parent pid: 10949
10980.e8f4: cwd is now '/home/user/PycharmProjects/synapse'
10980.e8f4: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10980.e8f4: New process: cmd: ['-c', '--multiprocessing-fork']
10980.e8f4: New process: pid: 10980, parent pid: 10949
IN on_sigterm pid=10980
10980.e8f4: sigterm: pid: 10980, instance: <coverage.control.Coverage object at 0x7f060047e610>
FINI on_sigterm pid=10980
IN on_sigterm pid=10980
10980.e8f4: sigterm: pid: 10980, instance: <coverage.control.Coverage object at 0x7f060047e700>
FINI on_sigterm pid=10980
10982.e52e: cwd is now '/home/user/PycharmProjects/synapse'
10982.e52e: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10982.e52e: New process: cmd: ['-c', '--multiprocessing-fork']
10982.e52e: New process: pid: 10982, parent pid: 10949
IN on_sigterm pid=10982
10982.e52e: sigterm: pid: 10982, instance: <coverage.control.Coverage object at 0x7f1bbd0a7610>
FINI on_sigterm pid=10982
IN on_sigterm pid=10982
10982.e52e: sigterm: pid: 10982, instance: <coverage.control.Coverage object at 0x7f1bbd0a7700>
FINI on_sigterm pid=10982
error during task iterNewBackupArchive
Traceback (most recent call last):
File "/home/user/PycharmProjects/synapse/synapse/daemon.py", line 138, in t2call
async for item in valu:
File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 692, in iterNewBackupArchive
await self.cell.iterNewBackupArchive(user=self.user, name=name, remove=remove)
File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 1625, in iterNewBackupArchive
raise s_exc.BadArg(mesg=mesg)
synapse.exc.BadArg: BadArg: mesg='Backup with name already exists'
10983.0649: cwd is now '/home/user/PycharmProjects/synapse'
10983.0649: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10983.0649: New process: cmd: ['-c', '--multiprocessing-fork']
10983.0649: New process: pid: 10983, parent pid: 10949
10987.51f1: cwd is now '/home/user/PycharmProjects/synapse'
10987.51f1: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10987.51f1: New process: cmd: ['-c', '--multiprocessing-fork']
10987.51f1: New process: pid: 10987, parent pid: 10949
11005.18da: cwd is now '/home/user/PycharmProjects/synapse'
11005.18da: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11005.18da: New process: cmd: ['-c', '--multiprocessing-fork']
11005.18da: New process: pid: 11005, parent pid: 10949
11009.15a2: cwd is now '/home/user/PycharmProjects/synapse'
11009.15a2: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11009.15a2: New process: cmd: ['-c', '--multiprocessing-fork']
11009.15a2: New process: pid: 11009, parent pid: 10949
11027.82f1: cwd is now '/home/user/PycharmProjects/synapse'
11027.82f1: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11027.82f1: New process: cmd: ['-c', '--multiprocessing-fork']
11027.82f1: New process: pid: 11027, parent pid: 10949
11031.4f81: cwd is now '/home/user/PycharmProjects/synapse'
11031.4f81: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11031.4f81: New process: cmd: ['-c', '--multiprocessing-fork']
11031.4f81: New process: pid: 11031, parent pid: 10949
11049.61f4: cwd is now '/home/user/PycharmProjects/synapse'
11049.61f4: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11049.61f4: New process: cmd: ['-c', '--multiprocessing-fork']
11049.61f4: New process: pid: 11049, parent pid: 10949
error during task iterNewBackupArchive
Traceback (most recent call last):
File "/home/user/PycharmProjects/synapse/synapse/daemon.py", line 138, in t2call
async for item in valu:
File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 692, in iterNewBackupArchive
await self.cell.iterNewBackupArchive(user=self.user, name=name, remove=remove)
File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 1632, in iterNewBackupArchive
await self.runBackup(name)
File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 1361, in runBackup
raise s_exc.BackupAlreadyRunning(mesg='Another backup is already running')
synapse.exc.BackupAlreadyRunning: BackupAlreadyRunning: mesg='Another backup is already running'
11053.881e: cwd is now '/home/user/PycharmProjects/synapse'
11053.881e: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11053.881e: New process: cmd: ['-c', '--multiprocessing-fork']
11053.881e: New process: pid: 11053, parent pid: 10949
11072.aa01: cwd is now '/home/user/PycharmProjects/synapse'
11072.aa01: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11072.aa01: New process: cmd: ['-c', '--multiprocessing-fork']
11072.aa01: New process: pid: 11072, parent pid: 10949
IN on_sigterm pid=11072
11072.aa01: sigterm: pid: 11072, instance: <coverage.control.Coverage object at 0x7f1d1da5dc40>
FINI on_sigterm pid=11072
IN on_sigterm pid=11072
11072.aa01: sigterm: pid: 11072, instance: <coverage.control.Coverage object at 0x7f1d21f24610>
FINI on_sigterm pid=11072
IN on_sigterm pid=11072
11072.aa01: sigterm: pid: 11072, instance: <coverage.control.Coverage object at 0x7f1d21f24700>
FINI on_sigterm pid=11072
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup/axon/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup2/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup2/axon/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup3/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup3/axon/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/20220126172444/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/20220126172444/axon/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
11076.8294: cwd is now '/home/user/PycharmProjects/synapse'
11076.8294: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11076.8294: New process: cmd: ['-c', '--multiprocessing-fork']
11076.8294: New process: pid: 11076, parent pid: 10949
11094.4f88: cwd is now '/home/user/PycharmProjects/synapse'
11094.4f88: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11094.4f88: New process: cmd: ['-c', '--multiprocessing-fork']
11094.4f88: New process: pid: 11094, parent pid: 10949
11098.e216: cwd is now '/home/user/PycharmProjects/synapse'
11098.e216: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11098.e216: New process: cmd: ['-c', '--multiprocessing-fork']
11098.e216: New process: pid: 11098, parent pid: 10949
11122.50a7: cwd is now '/home/user/PycharmProjects/synapse'
11122.50a7: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11122.50a7: New process: cmd: ['-c', '--multiprocessing-fork']
11122.50a7: New process: pid: 11122, parent pid: 10949
IN on_sigterm pid=11122
11122.50a7: sigterm: pid: 11122, instance: <coverage.control.Coverage object at 0x7fe6c0ebad90>
PASSED
========================================================================================== warnings summary ===========================================================================================
../../.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: log_cli
self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")
../../.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: log_cli_date_format
self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")
../../.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: log_cli_format
self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")
../../.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: log_cli_level
self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------------------------------------------------------- generated xml file: /home/user/PycharmProjects/synapse/test-reports/junit.xml ----------------------------------------------------------
---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover
--------------------------------------------------------------------------
synapse/__init__.py 8 3 62%
..... snip ...
--------------------------------------------------------------------------
TOTAL 37535 25690 32%
========================================================================================= slowest 6 durations =========================================================================================
13.52s call synapse/tests/test_cov_fail.py::CovFailTest::test_cell_stream_backup
0.00s setup synapse/tests/test_cov_fail.py::CovFailTest::test_cell_stream_backup
0.00s teardown synapse/tests/test_cov_fail.py::CovFailTest::test_cell_stream_backup
=================================================================================== 1 passed, 4 warnings in 17.61s ====================================================================================
10949.6f92: atexit: pid: 10949, instance: <coverage.control.Coverage object at 0x7f07160a3910>
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
Process SpawnProcess-17:
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
util._exit_function()
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 334, in _exit_function
_run_finalizers(0)
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
finalizer()
File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 224, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 99, in cleanup
_cleanup(_active_cov)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 84, in _cleanup
cov.stop()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 708, in save
data = self.get_data()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 777, in get_data
if self._collector and self._collector.flush_data():
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
self.covdata.add_lines(self.mapped_file_dict(self.data))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
return method(self, *args, **kwargs)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 459, in add_lines
self._choose_lines_or_arcs(lines=True)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 518, in _choose_lines_or_arcs
with self._connect() as con:
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 313, in _connect
self._open_db()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 264, in _open_db
self._read_db()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 268, in _read_db
with self._dbs[threading.get_ident()] as db:
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1073, in __enter__
self._connect()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1052, in _connect
self.con = sqlite3.connect(self.filename, check_same_thread=False)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 611, in _on_sigterm
self._atexit("sigterm")
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 604, in _atexit
self.save()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 708, in save
data = self.get_data()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 777, in get_data
if self._collector and self._collector.flush_data():
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
self.covdata.add_lines(self.mapped_file_dict(self.data))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 234, in _wrapped
with self._lock:
KeyboardInterrupt
11122.50a7: atexit: pid: 11122, instance: <coverage.control.Coverage object at 0x7fe6c0ebad90>
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1096, in execute
return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: context
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1101, in execute
return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: context
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
return method(self, *args, **kwargs)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 463, in add_lines
self._set_context_id()
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 421, in _set_context_id
context_id = self._context_id(context)
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 397, in _context_id
row = con.execute_one("select id from context where context = ?", (context,))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1129, in execute_one
rows = list(self.execute(sql, parameters))
File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1118, in execute
raise DataError(f"Couldn't use data file {self.filename!r}: {msg}") from exc
coverage.exceptions.DataError: Couldn't use data file '/home/user/PycharmProjects/synapse/.coverage.vertex005c.11122.270781': no such table: context
(syn3810) user@vertex005c:~/PycharmProjects/synapse$ 10951.dea1: atexit: pid: 10951, instance: <coverage.control.Coverage object at 0x7fab05dff610>
10955.ed98: atexit: pid: 10955, instance: <coverage.control.Coverage object at 0x7fce74ac3610>
This is caused by running code coverage on a function that uses multiprocessing.Pool
to fork threads.
The test suite hangs on this line of coverage
, specifically:
".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 234 in _wrapped
Child threads are unable to acquire an available mutex; when they are created, they are given a copy of an (unavailable) mutex that is thus never updated (some info about why). Probably related to this change "Feature: coverage measurement data will now be written when a SIGTERM signal is received by the process."
Fixed in my case by any of: using multiprocessing
to "spawn" instead of forking, reverting to version 6.2.0, or turning off code coverage.
Edit: Full stack trace of stalled thread
Current thread 0x00007f5675b14740 (most recent call first):
File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 234 in _wrapped
File ".../env/lib/python3.8/site-packages/coverage/collector.py", line 473 in flush_data
File ".../env/lib/python3.8/site-packages/coverage/control.py", line 766 in get_data
File ".../env/lib/python3.8/site-packages/coverage/control.py", line 697 in save
File ".../env/lib/python3.8/site-packages/coverage/control.py", line 601 in _atexit
File ".../env/lib/python3.8/site-packages/coverage/control.py", line 605 in _on_sigterm
File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 1156 in executescript
File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 299 in _init_db
File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 273 in _read_db
File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 264 in _open_db
File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 313 in _connect
File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 518 in _choose_lines_or_arcs
File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 459 in add_lines
File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 236 in _wrapped
File ".../env/lib/python3.8/site-packages/coverage/collector.py", line 473 in flush_data
File ".../env/lib/python3.8/site-packages/coverage/control.py", line 766 in get_data
File ".../env/lib/python3.8/site-packages/coverage/control.py", line 697 in save
File ".../env/lib/python3.8/site-packages/pytest_cov/embed.py", line 85 in _cleanup
File ".../env/lib/python3.8/site-packages/pytest_cov/embed.py", line 99 in cleanup
File "/usr/lib/python3.8/multiprocessing/util.py", line 224 in __call__
File "/usr/lib/python3.8/multiprocessing/util.py", line 300 in _run_finalizers
File "/usr/lib/python3.8/multiprocessing/util.py", line 334 in _exit_function
File "/usr/lib/python3.8/multiprocessing/process.py", line 318 in _bootstrap
File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 75 in _launch
File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19 in __init__
File "/usr/lib/python3.8/multiprocessing/context.py", line 277 in _Popen
File "/usr/lib/python3.8/multiprocessing/process.py", line 121 in start
File "/usr/lib/python3.8/multiprocessing/pool.py", line 326 in _repopulate_pool_static
File "/usr/lib/python3.8/multiprocessing/pool.py", line 303 in _repopulate_pool
File "/usr/lib/python3.8/multiprocessing/pool.py", line 212 in __init__
File "/usr/lib/python3.8/multiprocessing/context.py", line 119 in Pool
File ".../delphi_safegraph_patterns/run.py", line 108 in run_module
File ".../tests/test_run.py", line 36 in test_output_files
File "/usr/lib/python3.8/unittest/mock.py", line 1325 in patched
File ".../env/lib/python3.8/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
File ".../env/lib/python3.8/site-packages/_pytest/python.py", line 1641 in runtest
File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 255 in <lambda>
File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 311 in from_call
File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 254 in call_runtest_hook
File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 215 in call_and_report
File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
File ".../env/lib/python3.8/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
File ".../env/lib/python3.8/site-packages/_pytest/main.py", line 323 in _main
File ".../env/lib/python3.8/site-packages/_pytest/main.py", line 269 in wrap_session
File ".../env/lib/python3.8/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
File ".../env/lib/python3.8/site-packages/_pytest/config/__init__.py", line 162 in main
File ".../env/lib/python3.8/site-packages/_pytest/config/__init__.py", line 185 in console_main
File "../env/bin/pytest", line 8 in <module>
Here are the complete steps I use to reproduce this problem. Note that the branch provided installs the working 6.2 version of coverage, so you need to explicitly install the broken 6.3 version, or whatever possible-fix version you want. The output of the commands are in the collapsed details sections.
Also note that there is sometimes a warning about a non-coverage data file, which is another symptom of the underlying race condition.
Run these steps in a Python 3.8 environment:
IMO you shouldn't be using multiprocessing's default fork mode, unless you're trying to really optimize something - in which situation you'd have to accept that you'll eventually have deadlocks caused by locks being improperly shared across forked processes.
I don't want to be negative here but maybe the best thing to do here is just disable support for or issue big fat warning if multiprocessing in fork mode is detected.
I have a fix for the hang: it's because I used threading.Lock
, then tried to re-lock a lock I already held in the signal handler. Changing it to threading.RLock
fixes the hang. But I still see some issues related to sharing..
I've been able to cut down my test example to something that locks up "most of the time"; but most of the time just means "there is a race condition that is happening most of the time" :\
While we do use multiprocessing.pool in our code, we do initialize a forkserver
pool during import time of our library. We do this early during import time on the MainProcess to avoid the forkserver from possibly inheriting any resources related to things like the asyncio loop. A snippet from synapse/lib/coro.py is shown below:
# shared process pool
forkpool = None
if multiprocessing.current_process().name == 'MainProcess':
# only create the forkpool in the MainProcess...
try:
mpctx = multiprocessing.get_context('forkserver')
max_workers = int(os.getenv('SYN_FORKED_WORKERS', 1))
forkpool = concurrent.futures.ProcessPoolExecutor(mp_context=mpctx, max_workers=max_workers)
atexit.register(forkpool.shutdown)
except OSError as e: # pragma: no cover
logger.warning(f'Failed to init forkserver pool, fallback enabled: {e}', exc_info=True)
Changing the coverage/sqldata.py's Lock to a RLock does fix our issue. @tunetheweb I don't know if that's something you've got a test case that you could try out?
If you want a version with RLock to install, you can use:
pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1310
Just tried that branch and still seeing the same slowness and hanging for Python 3.7 unfortunately.
@tunetheweb is there any way you can get us a reproducible example with your code?
@barrywhart could you offer any clues here? Think you wrote the multprocessing part.
Dug up this code where most of the magic happens if that's any help:
def main():
"""Find all example SQL files, parse and create YAML files."""
parse_success_examples, _ = get_parse_fixtures()
with multiprocessing.Pool(multiprocessing.cpu_count()) as pool:
for _ in pool.imap_unordered(generate_parse_fixture, parse_success_examples):
pass
if __name__ == "__main__":
main()
@tunetheweb: Can you clarify what command(s) are hanging? The code you mention above is from a standalone script that creates some test YAML files; it's not part of the main package nor the automated tests.
In the past, I have worked around forking issues with multiprocessing.Pool
by using the third-party billiard.Pool
instead. (FYI: Billiard is a major component of the well-known Celery queuing package.) I don't know the technical details, but AFAIK, it lets you completely avoid inheriting handles and other objects from the parent process.
The Test Suite - particularly on 3.7: https://github.com/tunetheweb/sqlfluff/runs/4998066032?check_suite_focus=true (but seems slower on other versions too).
Ideally we'd have some sort of smaller reproducible test case as the whole SQLFluff code base is quite large as you know!
@tunetheweb: In that case, the issue probably arises from this module: https://github.com/sqlfluff/sqlfluff/blob/main/src/sqlfluff/core/linter/runner.py
The MultiProcessRunner
and MultiThreadRunner
classes use multiprocessing.Pool
and multiprocessing.dummy.Pool
.
Based on some reading I did just now (see StackOverflow link below), I bet we could avoid this issue by using spawn
or forkserver
methods. (Presumably, the hang occurs when using the fork
method.)
@barrywhart can you give me a way to reproduce the issue myself?
If you fork the SQLFluff repo and then open a PR in your fork, to your forkβs main branch to unpin the coverage version in requirements_dev.txt and then the CI tests will run in GitHub Actions. Example in my own fork: https://github.com/tunetheweb/sqlfluff/pull/6/files
Most of the GitHub Actions will complete within 10 mins.
The Python 3.7 tests will run twice as slowly for some reasons, and after 20 mins will reach 90% and then not go any further. It will then hang for ever and eventually GitHub will cancel it if you donβt.
the fact itβs slow (rather than doesnβt work), and the fact it reaches 90% every time and then hangs is odd to me.
Itβs consistently reproducible like that. The other Python versions I have seen issues with slowness and/or hanging much more occasionally but with 3.7 itβs every time.
The difficulty is getting a smaller reproducible test case as the repo is quite big with a lot of code. However, if you just want to test some changes, the above will work. Was thinking of just creating branches with every comment between 6.2 and 6.3 until i got to the commit that introduced this as kind of at a loss how better to reproduce this, and give some more clues.
If you want to reproduce it locally, I think you could just:
tox -e py37
. This is the command it's hanging on: pytest -vv -rs --cov=sqlfluff --cov-report=xml {posargs: {toxinidir}/test}
Let us know if that reproduces the issue for you. If it's helpful, then we can try and narrow it down. As I mentioned above, the module that handles subprocesses is https://github.com/sqlfluff/sqlfluff/blob/main/src/sqlfluff/core/linter/runner.py.
On my mac, I can run the sqlfluff test suite on 3.7 with either coverage.py==6.2 or ==6.3, and they both complete in the same amount of time (764s and 769s). Hmm.
@tunetheweb: Have you been able to reproduce the issue locally? I haven't yet.
I'm not that familiar with our CI builds, but IIUC, it's running approximately the following commands (extracted from tox.ini
and .github/workflows/ci-tests.yml
:
export SQLFLUFF_TESTENV=1
export COVERAGE_FILE=.coverage
coverage erase
python util.py clean-tests
pytest -vv -rs --cov=sqlfluff --cov-report=xml -n 2 test/
With sqlfluff a notable difference I see between the local example and the CI config is the inclusion of pytest-xdist via the -n 2
flag.
With sqlfluff a notable difference I see between the local example and the CI config is the inclusion of pytest-xdist via the
-n 2
flag.
Good spot. That's key, as that's what invokes our multiple threads.
@tunetheweb: Have you been able to reproduce the issue locally? I haven't yet.
No I've not, but can't seem to get a Python 3.7 env set up locally. I've also tried to to reduce the test cases in CI. I picked one that did not complete and reran it. But it completes when run on its own. So it's definitely like it's some load/race condition when it's got a big job to do and it seems to "lose" some jobs so never completes.
Also using -n 4
might make it more likely (as will use 4 thread then). Or even higher.
The SQLFluff build runs many Python versions concurrently, as well as running the dbt tests concurrently with the core tests. I don't know if these all run in the same server or separate, but perhaps this contributes to the problem as well.
I'm trying some proposed fixes. They reduce the likelihood of hangs, but it's not 100%: https://github.com/nedbat/sqlfluff/actions
The SQLFluff build runs many Python versions concurrently, as well as running the dbt tests concurrently with the core tests. I don't know if these all run in the same server or separate, but perhaps this contributes to the problem as well.
They are on separate machines. Also I did a run with the others turned off and it still hung.
The same problem affects the Annif codebase. It uses multiprocessing.Pool in forking mode. I originally reported the problems we had with GitHub Actions CI jobs here: https://github.com/NatLibFi/Annif/issues/564
I can now reproduce this locally every time on my machine. Let me know if I can help with a reproducible test case and/or testing potential solutions.
@osma yes, a local reproduction case would be very helpful.
OK, this is very much not a minimal test case, but at least it demonstrates the problem:
# checkout latest Annif code
git clone https://github.com/NatLibFi/Annif.git
# create a venv inside and activate it
mkdir Annif
python3 -m venv venv
. venv/bin/activate
# make sure we have latest pip & wheel
pip install -U pip wheel
# install Annif in editable mode, with dev dependencies
pip install -e .[dev]
# download NLTK punkt model as well if you don't have it from earlier
python -m nltk.downloader punkt
# run the MLLM part of the test suite with coverage 6.2
pip install coverage==6.2
pytest --cov=./ -v tests/test_backend_mllm.py
# final output: 10 passed in 31.35s
# now do the same with coverage 6.3
pip install coverage==6.3
pytest --cov=./ -v tests/test_backend_mllm.py
# on a 4-core machine (multiprocessing.cpu_count()==8): hangs practically every time
# on a 2-core machine (multiprocessing.cpu_count()==4): hangs maybe 1 out of 4 times
More specifically, the test that gets stuck is test_mllm_train
which uses multiprocessing.Pool to preprocess training documents in parallel.
I have the same issue on the CI tests of a scientific library of ours using multiprocessing
. Pinning coverage<6.3
makes the problem disappear. Unfortunately it would be hard to expose the minimal reproduction steps.
Here's a pretty minimal test case that I cobbled together:
process.py
:
#!/usr/bin/env python3
import multiprocessing
def calculate(num):
for i in range(100000):
num += num
return num
def process(jobs=16, numbers=64):
with multiprocessing.Pool(jobs) as pool:
orig_numbers = range(numbers)
new_numbers = pool.map(calculate, orig_numbers)
return new_numbers
if __name__ == '__main__':
numbers = process()
print(numbers)
test_process.py
:
import pytest
import process
def test_process():
numbers = process.process()
assert len(numbers) == 64
Install the dependencies (in a venv): pip install pytest pytest-cov coverage==6.3
Run the test with:
pytest --cov=./ test_process.py
On a 2-core machine (Ubuntu 20.04 amd64, Python 3.8.10) this hangs maybe 50% of the time. If it doesn't, it will finish in less than 10 seconds. With coverage 6.2, it works every time.
Osma's test case does not appear to lock up when using coveragepy's current master (aad5ece47bf12bceff4296516f23171a06b34bb5)
I've released coverage 6.3.1 with the RLock fix. I'm not sure what to do with the rest of this issue...
@benmwebb @anishathalye @BoboTiG @JannisNe @pietrodn @osma @apriha @pllim @KelSolaar @rosswhitfield @JoanFM @erykoff @haampie @Bultako @maxnoe @rlopezcoto @glemaitre @nmdefries @QuLogic @akihironitta @sir-sigurd @cliffckerr Would you try coverage 6.3.1 to see if it fixes your problems?
Would you try coverage 6.3.1 to see if it fixes your problems?
Doesn't appear to work for me, unfortunately:
Unfortunately not for us either https://github.com/matplotlib/matplotlib/runs/5031432498?check_suite_focus=true
But it only fails on Python 3.7, which we are soon about to drop support for... However, that is also the build that pins minimum versions for other things, so it might be something other than Python.
I'm using Python 3.9. With coverage==6.3.1
it does not hang, however it gives errors such as:
sqlite3.OperationalError: no such table: file
Here I posted full tracebacks.
I'm running the tests with this command:
pytest \
--durations=10 \
--capture=no \
--cov-branch \
--cov-fail-under=0 \
--cov-report=html:coverage-reports \
--cov=my_package \
--junitxml=test-reports/junit.xml
The errors aren't there with 6.2.
Describe the bug A clear and concise description of the bug.
We've been having issues with our CI in GitHub Actions for the last few hours, and think it might be because of Coverage 6.3 - it's the one thing that's changed, and freezing it at 6.2 seems to allow runs to complete successfully.
To Reproduce How can we reproduce the problem? Please be specific. Don't link to a failing CI job. Answer the questions below:
coverage debug sys
is helpful. 6.2.3pip freeze
is helpful. See https://gist.github.com/tunetheweb/4d288ea4467ba74a66b3a0e2e8d5e4eatox -e py37 -- -n 2 test
should reproduce it. Having problems setting up a 3.7 environment but will try to get a better test case. We do use a multithreaded process and noticed some changes to that.Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.
Will try to get a better repo.