nedbat / coveragepy

The code coverage tool for Python
https://coverage.readthedocs.io
Apache License 2.0
3.03k stars 435 forks source link

Process hang with Coverage 6.3 #1310

Closed tunetheweb closed 2 years ago

tunetheweb commented 2 years ago

Describe the bug A clear and concise description of the bug.

We've been having issues with our CI in GitHub Actions for the last few hours, and think it might be because of Coverage 6.3 - it's the one thing that's changed, and freezing it at 6.2 seems to allow runs to complete successfully.

To Reproduce How can we reproduce the problem? Please be specific. Don't link to a failing CI job. Answer the questions below:

  1. What version of Python are you using? 3.7 seems to be a bit buggy with this. We also run 3.8 - 3.10 in CI and no issues seen with that.
  2. What version of coverage.py shows the problem? The output of coverage debug sys is helpful. 6.2.3
  3. What versions of what packages do you have installed? The output of pip freeze is helpful. See https://gist.github.com/tunetheweb/4d288ea4467ba74a66b3a0e2e8d5e4ea
  4. What code shows the problem? Give us a specific commit of a specific repo that we can check out. If you've already worked around the problem, please provide a commit before that fix. This is tricky. We run a lot of commands in CI, but checking out https://github.com/sqlfluff/sqlfluff/ and running tox -e py37 -- -n 2 test should reproduce it. Having problems setting up a 3.7 environment but will try to get a better test case. We do use a multithreaded process and noticed some changes to that.
  5. What commands did you run?

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

Will try to get a better repo.

vEpiphyte commented 2 years ago

We've experienced a similar issue with CI timeouts in CircleCI when using Python 3.8. This occurs when using pytest as our test runner with coverage. The hang occurs after tests pass and the coverage is reported to the console; it's almost like the python process fails to exit.

(venv) root@af00fef01e49:/home/circleci/repo# python -V
Python 3.8.12

(venv) root@af00fef01e49:/home/circleci/repo# coverage debug sys
-- sys -------------------------------------------------------
               coverage_version: 6.3
                coverage_module: /home/circleci/repo/venv/lib/python3.8/site-packages/coverage/__init__.py
                         tracer: -none-
                        CTracer: available
           plugins.file_tracers: -none-
            plugins.configurers: -none-
      plugins.context_switchers: -none-
              configs_attempted: .coveragerc
                   configs_read: /home/circleci/repo/.coveragerc
                    config_file: /home/circleci/repo/.coveragerc
                config_contents: b'[report]\nomit =\n    */synapse/tests/test_*\n'
                      data_file: -none-
                         python: 3.8.12 (default, Aug 31 2021, 16:30:39) [GCC 9.3.0]
                       platform: Linux-5.11.0-1022-aws-x86_64-with-glibc2.29
                 implementation: CPython
                     executable: /home/circleci/repo/venv/bin/python3
                   def_encoding: utf-8
                    fs_encoding: utf-8
                            pid: 322
                            cwd: /home/circleci/repo
                           path: /home/circleci/repo/venv/bin
                                 /home/circleci/.pyenv/versions/3.8.12/lib/python38.zip
                                 /home/circleci/.pyenv/versions/3.8.12/lib/python3.8
                                 /home/circleci/.pyenv/versions/3.8.12/lib/python3.8/lib-dynload
                                 /home/circleci/repo/venv/lib/python3.8/site-packages
                                 /home/circleci/repo
                    environment: HOME = /root
                   command_line: /home/circleci/repo/venv/bin/coverage debug sys
                sqlite3_version: 2.6.0
         sqlite3_sqlite_version: 3.31.1
             sqlite3_temp_store: 0
        sqlite3_compile_options: COMPILER=gcc-9.3.0; ENABLE_COLUMN_METADATA; ENABLE_DBSTAT_VTAB
                                 ENABLE_FTS3; ENABLE_FTS3_PARENTHESIS; ENABLE_FTS3_TOKENIZER
                                 ENABLE_FTS4; ENABLE_FTS5; ENABLE_JSON1
                                 ENABLE_LOAD_EXTENSION; ENABLE_PREUPDATE_HOOK; ENABLE_RTREE
                                 ENABLE_SESSION; ENABLE_STMTVTAB; ENABLE_UNLOCK_NOTIFY
                                 ENABLE_UPDATE_DELETE_LIMIT; HAVE_ISNAN; LIKE_DOESNT_MATCH_BLOBS
                                 MAX_SCHEMA_RETRY=25; MAX_VARIABLE_NUMBER=250000; OMIT_LOOKASIDE
                                 SECURE_DELETE; SOUNDEX; TEMP_STORE=1
                                 THREADSAFE=1; USE_URI

Pip freeze output

(venv) root@af00fef01e49:/home/circleci/repo# pip freeze
aiohttp==3.8.1
aiohttp-socks==0.6.1
aioimaplib==0.9.0
aiosignal==1.2.0
aiosmtplib==1.1.6
antlr4-python3-runtime==4.8
appdirs==1.4.4
arrow==1.2.1
async-timeout==4.0.2
attrs==21.4.0
base58==2.1.1
bech32==1.2.0
cattrs==1.10.0
cbor2==5.4.2
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.10
colorama==0.4.4
coverage==6.3
cpe==1.2.1
cryptography==36.0.1
fastjsonschema==2.14.5
fqdn==1.5.1
frozenlist==1.3.0
idna==3.3
importlib-resources==5.4.0
iniconfig==1.1.1
isoduration==20.11.0
jsonpointer==2.2
jsonschema==4.4.0
lark-parser==0.11.2
lmdb==1.3.0
more-itertools==8.12.0
msgpack==1.0.3
multidict==6.0.2
oauthlib==3.1.1
packaging==20.9
pluggy==1.0.0
prompt-toolkit==3.0.24
py==1.11.0
pycodestyle==2.8.0
pycparser==2.21
pycryptodome==3.12.0
Pygments==2.7.4
pyOpenSSL==21.0.0
pyparsing==3.0.7
pyrsistent==0.18.1
pytest==6.2.5
pytest-cov==3.0.0
python-bitcoinlib==0.11.0
python-dateutil==2.8.2
python-socks==1.2.4
PyYAML==6.0
regex==2022.1.18
requests==2.27.1
requests-cache==0.9.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
scalecodec==1.0.2
simplejson==3.17.6
six==1.16.0
stix2-patterns==1.3.2
stix2-validator==3.0.2
-e git+ssh://git@github.com/vertexproject/synapse.git@c8568da02165fe0c8131b8ffcc1e48e20dc2ad07#egg=synapse
toml==0.10.2
tomli==2.0.0
tornado==6.1
typing_extensions==4.0.1
uri-template==1.1.0
url-normalize==1.4.3
urllib3==1.26.8
vcrpy==4.1.1
wcwidth==0.2.5
webcolors==1.11.1
wrapt==1.13.3
xxhash==2.0.2
yarl==1.7.2
zipp==3.7.0

This happened to our master commit here https://github.com/vertexproject/synapse/commit/1b2d327a32e33b443530f2d8699ad74d1cf8f444

This was fixed in https://github.com/vertexproject/synapse/pull/2534/files#diff-c9c796259f3852b51b531b79cbf07820145088d4f108fb006ab44b15f0b15934R8

We did find the following line in a CI log for one of the failing tests which may or may not be related - it occurred prior to the pytest warnings summary and the console coverage output.

/home/circleci/repo/venv/lib/python3.8/site-packages/coverage/data.py:129: 
CoverageWarning: Data file '/home/circleci/repo/.coverage.803d4f93f49b.429.890315' doesn't seem to be a coverage data file: cannot unpack non-iterable NoneType object
tunetheweb commented 2 years ago

Interesting that you saw this on 3.8. We experienced some intermittent hanging in other Python versions but repeated hanging (every time) in 3.7, so wasn’t sure if the intermittent issues were related.

I also forgot to include my error message. It happened near the end of the test, but not quite at the end as you experienced (not sqlfluff is our project, but the last line might be useful):

[gw1] [ 90%] PASSED test/rules/yaml_test_cases_test.py::test__rule_test_global_config /home/runner/work/sqlfluff/sqlfluff/.tox/py37/lib/python3.7/site-packages/coverage/inorout.py:535: CoverageWarning: Module sqlfluff was previously imported, but not measured (module-not-measured)
5583
  self.warn(msg, slug="module-not-measured")
nedbat commented 2 years ago

I'm not clear what problems you are reporting. @tunetheweb the original issue description doesn't mention what the actual problem is. @vEpiphyte says they've seen similar problems, and mentions hanging. Then @tunetheweb shows a module-not-measured warning. This sounds like two different issues.

Anything you can do to give me a reproducible scenario would be great.

tunetheweb commented 2 years ago

Yes I appreciate this is a very poor issue to raise. Sorry about that, but thought I'd give you a heads up as saw it right after the release, and it stopped happening right after I pinned to the old version.

Will try to give a more meaningful reproducible use case if I can narrow it down.

The issue is jobs hanging (I cancelled them after an hour, when they normally take 5 mins. One job I left running and it took 3 hours and counting). That was for Python 3.7. I also saw a lot of slow down in jobs (5 min jobs were more often than not taking 20mins when they did complete). That was for Python 3.8+.

Initially I thought GitHub Actions were on the blink but at soon as I pinned the old version of coverage it all worked. Unpinning it again breaks it again. Nothing else changes between the runs.

But as to narrowing it down why, at the moment I don't have more info to help explain that. But will keep digging...

Think the error message is a red-herring. It was the last thing printed before it froze. But do see that at the end of the run elsewhere (though interestingly I only see it at the end of good runs, whereas that appeared after our CI job was only at 90% and then hung).

Sorry, again for such a poor report. Can close it if you want until I can get you more info, as appreciate it's difficult to do anything with what I've written here but as I say this was more a heads up, and a vague hope someone would have an idea what it might be 😞

nedbat commented 2 years ago

@tunetheweb @vEpiphyte @sir-sigurd Can you try installing this branch of coverage.py to see if it fixes the problem?

pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1312
tunetheweb commented 2 years ago

Doesn't look any better with that to be honest.

BTW happy to move this discussion to #1312 as that issue would appear to be a duplicate of this one, but with much better detail!

tunetheweb commented 2 years ago

Yeah definitely seeing same issue with that unfortunately :-(

But it is very reproducible so, while I haven't been able to help with a minimal reproducible test case, I'm very happy to test any other test branches you have. Just takes about 10mins to run.

sir-sigurd commented 2 years ago

@tunetheweb @vEpiphyte @sir-sigurd Can you try installing this branch of coverage.py to see if it fixes the problem?

pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1312

Nope, it doesn't help.

vEpiphyte commented 2 years ago

I did reproduce the error locally with a set of our unit tests. Recording this big blob for posterity - I haven't had the time to whittle it down yet but will do shortly.

$ python -m pytest  -v -s -rs --durations 6 --maxfail 6 -p no:logging --junitxml=test-reports/junit.xml --cov synapse --no-cov-on-fail synapse/tests/test_cmds_trigger.py synapse/tests/test_lib_cell.py synapse/tests/test_lib_crypto_ecc.py synapse/tests/test_lib_hashset.py synapse/tests/test_lib_layer.py synapse/tests/test_lib_node.py synapse/tests/test_lib_scope.py synapse/tests/test_lib_stormhttp.py synapse/tests/test_lib_stormlib_model.py synapse/tests/test_lib_stormlib_yaml.py synapse/tests/test_lib_trigger.py synapse/tests/test_model_auth.py synapse/tests/test_model_geospace.py synapse/tests/test_model_media.py synapse/tests/test_servers_axon.py synapse/tests/test_tools_backup.py synapse/tests/test_tools_guid.py synapse/tests/test_tools_storm.py

This produces the same hang locally using Python 3.8.10 on linux (ubuntu 21.04) and coverage 6.3.0.

I then installed ned's patch with the following

pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1312

And got the same hang. Sending sigint (ctrl+c) to the process yielded the following traceback

^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
Process SpawnProcess-22:
Process SpawnProcess-18:
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1096, in execute
    return self.con.execute(sql, parameters)
Traceback (most recent call last):
sqlite3.OperationalError: no such table: coverage_schema

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1101, in execute
    return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: coverage_schema

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 270, in _read_db
    schema_version, = db.execute_one("select version from coverage_schema")
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 334, in _exit_function
    _run_finalizers(0)
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1129, in execute_one
    rows = list(self.execute(sql, parameters))
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 99, in cleanup
    _cleanup(_active_cov)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1118, in execute
    raise DataError(f"Couldn't use data file {self.filename!r}: {msg}") from exc
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 84, in _cleanup
    cov.stop()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 700, in save
    data = self.get_data()
coverage.exceptions.DataError: Couldn't use data file '/home/user/PycharmProjects/synapse/.coverage.vertex005c.35596.883395': no such table: coverage_schema
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 769, in get_data
    if self._collector and self._collector.flush_data():
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
    self.covdata.add_lines(self.mapped_file_dict(self.data))

During handling of the above exception, another exception occurred:

  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
    return method(self, *args, **kwargs)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 466, in add_lines
    file_id = self._file_id(filename, add=True)
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 388, in _file_id
    cur = con.execute("insert or replace into file (path) values (?)", (filename,))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1096, in execute
    return self.con.execute(sql, parameters)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 608, in _on_sigterm
    self._atexit("sigterm")
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 604, in _atexit
    self.save()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 700, in save
    data = self.get_data()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 769, in get_data
    if self._collector and self._collector.flush_data():
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
    self.covdata.add_lines(self.mapped_file_dict(self.data))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 234, in _wrapped
    with self._lock:
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
KeyboardInterrupt
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 334, in _exit_function
    _run_finalizers(0)
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 99, in cleanup
    _cleanup(_active_cov)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 84, in _cleanup
    cov.stop()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 700, in save
    data = self.get_data()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 769, in get_data
    if self._collector and self._collector.flush_data():
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
    self.covdata.add_lines(self.mapped_file_dict(self.data))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
    return method(self, *args, **kwargs)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 459, in add_lines
    self._choose_lines_or_arcs(lines=True)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 518, in _choose_lines_or_arcs
    with self._connect() as con:
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 313, in _connect
    self._open_db()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 264, in _open_db
    self._read_db()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 273, in _read_db
    self._init_db(db)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 299, in _init_db
    db.executescript(SCHEMA)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1156, in executescript
    self.con.executescript(script)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 608, in _on_sigterm
    self._atexit("sigterm")
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 604, in _atexit
    self.save()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 700, in save
    data = self.get_data()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 769, in get_data
    if self._collector and self._collector.flush_data():
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
    self.covdata.add_lines(self.mapped_file_dict(self.data))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 234, in _wrapped
    with self._lock:
KeyboardInterrupt
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1096, in execute
    return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: context

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1101, in execute
    return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: context

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
    return method(self, *args, **kwargs)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 463, in add_lines
    self._set_context_id()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 421, in _set_context_id
    context_id = self._context_id(context)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 397, in _context_id
    row = con.execute_one("select id from context where context = ?", (context,))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1129, in execute_one
    rows = list(self.execute(sql, parameters))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1118, in execute
    raise DataError(f"Couldn't use data file {self.filename!r}: {msg}") from exc
coverage.exceptions.DataError: Couldn't use data file '/home/user/PycharmProjects/synapse/.coverage.vertex005c.35541.378922': no such table: context
nedbat commented 2 years ago

@vEpiphyte thanks for the steps. I can see the hang on my Mac. If I remove pytest-cov from the command, then the hang goes away. That is, I remove --cov synapse --no-cov-on-fail from the command line, and change from:

$ python -m pytest ...

to:

$ python -m coverage run -m pytest ...

@tunetheweb It looks like you are also using pytest-cov. If you remove it, and run under coverage, do you still see the problem?

vEpiphyte commented 2 years ago

@nedbat Thanks for that tip about pytest-cov vs raw coverage. That worked as expected - there was no lock when I ran directly with coverage. I've also reduced the tests down to a single test and pushed that test file up to a branch here https://github.com/vertexproject/synapse/blob/5a80408d1e8f75c3ab13df48a8845da7c8ab2051/synapse/tests/test_cov_fail.py

The following command hangs like we've seen already.

python -m pytest -v -s -rs --durations 6 --maxfail 6 -p no:logging --junitxml=test-reports/junit.xml --cov synapse --no-cov-on-fail synapse/tests/test_cov_fail.py 

The following command works as you demonstrated.

python -m coverage run -m pytest -v -s -rs --durations 6 --maxfail 6 -p no:logging --junitxml=test-reports/junit.xml synapse/tests/test_cov_fail.py 

This particular test does have the effect of testing lots of multiprocessing targets to test a bunch of error handling and bad-path logic.

tunetheweb commented 2 years ago

@tunetheweb It looks like you are also using pytest-cov. If you remove it, and run under coverage, do you still see the problem?

I can confirm for me the problem does not arise for me when running 6.3 outside of pytest.

vEpiphyte commented 2 years ago

Using my test file with pytest-cov and git bisect on coveragepy repo points to the problem starting to appear with this commit https://github.com/nedbat/coveragepy/commit/dd575eec0fa173bcb6c680d2f1c822a28280d011

Patching the sigterm handler

    def _on_sigterm(self, signum_unused, frame_unused):
        """A handler for signal.SIGTERM."""
        pid = os.getpid()
        print(f'IN on_sigterm {pid=}')

        self._atexit("sigterm")
        # Statements after here won't be seen by metacov because we just wrote
        # the data, and are about to kill the process.
        signal.signal(signal.SIGTERM, self._old_sigterm)    # pragma: not covered
        print(f'FINI on_sigterm {pid=}')
        os.kill(pid, signal.SIGTERM)                # pragma: not covered

seems to stop at the signal.signal() call in the forkserver process. We never print the FINI ... message :(

user    9428  4.7  0.2 141008 88796 pts/2    S+   17:09   0:01 /home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=7, pipe_handle=61) --multiprocessing-fork
nedbat commented 2 years ago

@vEpiphyte Thanks for the bisect and the debugging! Would you mind wrapping a try/except around that signal.signal line and seeing what's getting raised? Also, if you could run coverage with a COVERAGE_DEBUG=process,pid environment variable, there will be more debugging output. You might need -s on the pytest line to get it all.

vEpiphyte commented 2 years ago

The signal.signal didn't raise an exception (or at least not one caught with a bare try/except block). Here is my output:

$ COVERAGE_DEBUG=process,pid python -m pytest -v -s -rs --durations 6 --maxfail 6 -p no:logging --junitxml=test-reports/junit.xml --cov synapse --no-cov-on-fail synapse/tests/test_cov_fail.py 
10949.6f92: cwd is now '/home/user/PycharmProjects/synapse'
10949.6f92: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10949.6f92: New process: cmd: ['/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest/__main__.py', '-v', '-s', '-rs', '--durations', '6', '--maxfail', '6', '-p', 'no:logging', '--junitxml=test-reports/junit.xml', '--cov', 'synapse', '--no-cov-on-fail', 'synapse/tests/test_cov_fail.py']
10949.6f92: New process: pid: 10949, parent pid: 4785
========================================================================================= test session starts =========================================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python
cachedir: .pytest_cache
rootdir: /home/user/PycharmProjects/synapse, configfile: pytest.ini
plugins: forked-1.3.0, anyio-3.3.1, xdist-2.5.0, cov-3.0.0
collecting ... 10951.dea1: cwd is now '/home/user/PycharmProjects/synapse'
10951.dea1: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10951.dea1: New process: cmd: ['-c']
10951.dea1: New process: pid: 10951, parent pid: 10949
collected 1 item                                                                                                                                                                                      

synapse/tests/test_cov_fail.py::CovFailTest::test_cell_stream_backup 10955.ed98: cwd is now '/home/user/PycharmProjects/synapse'
10955.ed98: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10955.ed98: New process: cmd: ['-c']
10955.ed98: New process: pid: 10955, parent pid: 10949
error during task iterBackupArchive
Traceback (most recent call last):
  File "/home/user/PycharmProjects/synapse/synapse/daemon.py", line 138, in t2call
    async for item in valu:
  File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 675, in iterBackupArchive
    await self.cell.iterBackupArchive(name, user=self.user)
  File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 1572, in iterBackupArchive
    raise s_exc.BadArg(mesg=mesg, arg='path', valu=path)
synapse.exc.BadArg: BadArg: arg='path' mesg='Specified backup path has no cell.guid file.' valu='/tmp/tmpcxvtnpu0/backups/nope'
10962.2919: cwd is now '/home/user/PycharmProjects/synapse'
10962.2919: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10962.2919: New process: cmd: ['-c', '--multiprocessing-fork']
10962.2919: New process: pid: 10962, parent pid: 10949
10980.e8f4: cwd is now '/home/user/PycharmProjects/synapse'
10980.e8f4: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10980.e8f4: New process: cmd: ['-c', '--multiprocessing-fork']
10980.e8f4: New process: pid: 10980, parent pid: 10949
IN on_sigterm pid=10980
10980.e8f4: sigterm: pid: 10980, instance: <coverage.control.Coverage object at 0x7f060047e610>
FINI on_sigterm pid=10980
IN on_sigterm pid=10980
10980.e8f4: sigterm: pid: 10980, instance: <coverage.control.Coverage object at 0x7f060047e700>
FINI on_sigterm pid=10980
10982.e52e: cwd is now '/home/user/PycharmProjects/synapse'
10982.e52e: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10982.e52e: New process: cmd: ['-c', '--multiprocessing-fork']
10982.e52e: New process: pid: 10982, parent pid: 10949
IN on_sigterm pid=10982
10982.e52e: sigterm: pid: 10982, instance: <coverage.control.Coverage object at 0x7f1bbd0a7610>
FINI on_sigterm pid=10982
IN on_sigterm pid=10982
10982.e52e: sigterm: pid: 10982, instance: <coverage.control.Coverage object at 0x7f1bbd0a7700>
FINI on_sigterm pid=10982
error during task iterNewBackupArchive
Traceback (most recent call last):
  File "/home/user/PycharmProjects/synapse/synapse/daemon.py", line 138, in t2call
    async for item in valu:
  File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 692, in iterNewBackupArchive
    await self.cell.iterNewBackupArchive(user=self.user, name=name, remove=remove)
  File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 1625, in iterNewBackupArchive
    raise s_exc.BadArg(mesg=mesg)
synapse.exc.BadArg: BadArg: mesg='Backup with name already exists'
10983.0649: cwd is now '/home/user/PycharmProjects/synapse'
10983.0649: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10983.0649: New process: cmd: ['-c', '--multiprocessing-fork']
10983.0649: New process: pid: 10983, parent pid: 10949
10987.51f1: cwd is now '/home/user/PycharmProjects/synapse'
10987.51f1: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
10987.51f1: New process: cmd: ['-c', '--multiprocessing-fork']
10987.51f1: New process: pid: 10987, parent pid: 10949
11005.18da: cwd is now '/home/user/PycharmProjects/synapse'
11005.18da: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11005.18da: New process: cmd: ['-c', '--multiprocessing-fork']
11005.18da: New process: pid: 11005, parent pid: 10949
11009.15a2: cwd is now '/home/user/PycharmProjects/synapse'
11009.15a2: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11009.15a2: New process: cmd: ['-c', '--multiprocessing-fork']
11009.15a2: New process: pid: 11009, parent pid: 10949
11027.82f1: cwd is now '/home/user/PycharmProjects/synapse'
11027.82f1: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11027.82f1: New process: cmd: ['-c', '--multiprocessing-fork']
11027.82f1: New process: pid: 11027, parent pid: 10949
11031.4f81: cwd is now '/home/user/PycharmProjects/synapse'
11031.4f81: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11031.4f81: New process: cmd: ['-c', '--multiprocessing-fork']
11031.4f81: New process: pid: 11031, parent pid: 10949
11049.61f4: cwd is now '/home/user/PycharmProjects/synapse'
11049.61f4: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11049.61f4: New process: cmd: ['-c', '--multiprocessing-fork']
11049.61f4: New process: pid: 11049, parent pid: 10949
error during task iterNewBackupArchive
Traceback (most recent call last):
  File "/home/user/PycharmProjects/synapse/synapse/daemon.py", line 138, in t2call
    async for item in valu:
  File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 692, in iterNewBackupArchive
    await self.cell.iterNewBackupArchive(user=self.user, name=name, remove=remove)
  File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 1632, in iterNewBackupArchive
    await self.runBackup(name)
  File "/home/user/PycharmProjects/synapse/synapse/lib/cell.py", line 1361, in runBackup
    raise s_exc.BackupAlreadyRunning(mesg='Another backup is already running')
synapse.exc.BackupAlreadyRunning: BackupAlreadyRunning: mesg='Another backup is already running'
11053.881e: cwd is now '/home/user/PycharmProjects/synapse'
11053.881e: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11053.881e: New process: cmd: ['-c', '--multiprocessing-fork']
11053.881e: New process: pid: 11053, parent pid: 10949
11072.aa01: cwd is now '/home/user/PycharmProjects/synapse'
11072.aa01: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11072.aa01: New process: cmd: ['-c', '--multiprocessing-fork']
11072.aa01: New process: pid: 11072, parent pid: 10949
IN on_sigterm pid=11072
11072.aa01: sigterm: pid: 11072, instance: <coverage.control.Coverage object at 0x7f1d1da5dc40>
FINI on_sigterm pid=11072
IN on_sigterm pid=11072
11072.aa01: sigterm: pid: 11072, instance: <coverage.control.Coverage object at 0x7f1d21f24610>
FINI on_sigterm pid=11072
IN on_sigterm pid=11072
11072.aa01: sigterm: pid: 11072, instance: <coverage.control.Coverage object at 0x7f1d21f24700>
FINI on_sigterm pid=11072
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup/axon/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup2/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup2/axon/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup3/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/bkup3/axon/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/20220126172444/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
Migrating Nexus log v1->v2 for /tmp/tmpcxvtnpu0/20220126172444/axon/slabs/nexus.lmdb
Nothing in the nexuslog sequence to migrate.
...Nexus log migration complete
11076.8294: cwd is now '/home/user/PycharmProjects/synapse'
11076.8294: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11076.8294: New process: cmd: ['-c', '--multiprocessing-fork']
11076.8294: New process: pid: 11076, parent pid: 10949
11094.4f88: cwd is now '/home/user/PycharmProjects/synapse'
11094.4f88: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11094.4f88: New process: cmd: ['-c', '--multiprocessing-fork']
11094.4f88: New process: pid: 11094, parent pid: 10949
11098.e216: cwd is now '/home/user/PycharmProjects/synapse'
11098.e216: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11098.e216: New process: cmd: ['-c', '--multiprocessing-fork']
11098.e216: New process: pid: 11098, parent pid: 10949
11122.50a7: cwd is now '/home/user/PycharmProjects/synapse'
11122.50a7: New process: executable: '/home/user/.pyenv/versions/3.8.10/envs/syn3810/bin/python'
11122.50a7: New process: cmd: ['-c', '--multiprocessing-fork']
11122.50a7: New process: pid: 11122, parent pid: 10949
IN on_sigterm pid=11122
11122.50a7: sigterm: pid: 11122, instance: <coverage.control.Coverage object at 0x7fe6c0ebad90>
PASSED

========================================================================================== warnings summary ===========================================================================================
../../.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
  /home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: log_cli

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

../../.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
  /home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: log_cli_date_format

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

../../.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
  /home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: log_cli_format

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

../../.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
  /home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: log_cli_level

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/warnings.html
---------------------------------------------------------- generated xml file: /home/user/PycharmProjects/synapse/test-reports/junit.xml ----------------------------------------------------------

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name                                                   Stmts   Miss  Cover
--------------------------------------------------------------------------
synapse/__init__.py                                        8      3    62%

..... snip ...

--------------------------------------------------------------------------
TOTAL                                                  37535  25690    32%

========================================================================================= slowest 6 durations =========================================================================================
13.52s call     synapse/tests/test_cov_fail.py::CovFailTest::test_cell_stream_backup
0.00s setup    synapse/tests/test_cov_fail.py::CovFailTest::test_cell_stream_backup
0.00s teardown synapse/tests/test_cov_fail.py::CovFailTest::test_cell_stream_backup
=================================================================================== 1 passed, 4 warnings in 17.61s ====================================================================================
10949.6f92: atexit: pid: 10949, instance: <coverage.control.Coverage object at 0x7f07160a3910>

^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
Process SpawnProcess-17:
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 334, in _exit_function
    _run_finalizers(0)
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/user/.pyenv/versions/3.8.10/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 99, in cleanup
    _cleanup(_active_cov)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/pytest_cov/embed.py", line 84, in _cleanup
    cov.stop()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 708, in save
    data = self.get_data()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 777, in get_data
    if self._collector and self._collector.flush_data():
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
    self.covdata.add_lines(self.mapped_file_dict(self.data))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
    return method(self, *args, **kwargs)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 459, in add_lines
    self._choose_lines_or_arcs(lines=True)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 518, in _choose_lines_or_arcs
    with self._connect() as con:
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 313, in _connect
    self._open_db()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 264, in _open_db
    self._read_db()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 268, in _read_db
    with self._dbs[threading.get_ident()] as db:
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1073, in __enter__
    self._connect()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1052, in _connect
    self.con = sqlite3.connect(self.filename, check_same_thread=False)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 611, in _on_sigterm
    self._atexit("sigterm")
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 604, in _atexit
    self.save()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 708, in save
    data = self.get_data()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/control.py", line 777, in get_data
    if self._collector and self._collector.flush_data():
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/collector.py", line 473, in flush_data
    self.covdata.add_lines(self.mapped_file_dict(self.data))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 234, in _wrapped
    with self._lock:
KeyboardInterrupt
11122.50a7: atexit: pid: 11122, instance: <coverage.control.Coverage object at 0x7fe6c0ebad90>
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1096, in execute
    return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: context

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1101, in execute
    return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: context

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 236, in _wrapped
    return method(self, *args, **kwargs)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 463, in add_lines
    self._set_context_id()
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 421, in _set_context_id
    context_id = self._context_id(context)
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 397, in _context_id
    row = con.execute_one("select id from context where context = ?", (context,))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1129, in execute_one
    rows = list(self.execute(sql, parameters))
  File "/home/user/.pyenv/versions/3.8.10/envs/syn3810/lib/python3.8/site-packages/coverage/sqldata.py", line 1118, in execute
    raise DataError(f"Couldn't use data file {self.filename!r}: {msg}") from exc
coverage.exceptions.DataError: Couldn't use data file '/home/user/PycharmProjects/synapse/.coverage.vertex005c.11122.270781': no such table: context
(syn3810) user@vertex005c:~/PycharmProjects/synapse$ 10951.dea1: atexit: pid: 10951, instance: <coverage.control.Coverage object at 0x7fab05dff610>
10955.ed98: atexit: pid: 10955, instance: <coverage.control.Coverage object at 0x7fce74ac3610>
nmdefries commented 2 years ago

This is caused by running code coverage on a function that uses multiprocessing.Pool to fork threads.

The test suite hangs on this line of coverage, specifically:

".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 234 in _wrapped

Child threads are unable to acquire an available mutex; when they are created, they are given a copy of an (unavailable) mutex that is thus never updated (some info about why). Probably related to this change "Feature: coverage measurement data will now be written when a SIGTERM signal is received by the process."

Fixed in my case by any of: using multiprocessing to "spawn" instead of forking, reverting to version 6.2.0, or turning off code coverage.

Edit: Full stack trace of stalled thread

Current thread 0x00007f5675b14740 (most recent call first):
  File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 234 in _wrapped
  File ".../env/lib/python3.8/site-packages/coverage/collector.py", line 473 in flush_data
  File ".../env/lib/python3.8/site-packages/coverage/control.py", line 766 in get_data
  File ".../env/lib/python3.8/site-packages/coverage/control.py", line 697 in save
  File ".../env/lib/python3.8/site-packages/coverage/control.py", line 601 in _atexit
  File ".../env/lib/python3.8/site-packages/coverage/control.py", line 605 in _on_sigterm
  File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 1156 in executescript
  File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 299 in _init_db
  File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 273 in _read_db
  File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 264 in _open_db
  File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 313 in _connect
  File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 518 in _choose_lines_or_arcs
  File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 459 in add_lines
  File ".../env/lib/python3.8/site-packages/coverage/sqldata.py", line 236 in _wrapped
  File ".../env/lib/python3.8/site-packages/coverage/collector.py", line 473 in flush_data
  File ".../env/lib/python3.8/site-packages/coverage/control.py", line 766 in get_data
  File ".../env/lib/python3.8/site-packages/coverage/control.py", line 697 in save
  File ".../env/lib/python3.8/site-packages/pytest_cov/embed.py", line 85 in _cleanup
  File ".../env/lib/python3.8/site-packages/pytest_cov/embed.py", line 99 in cleanup
  File "/usr/lib/python3.8/multiprocessing/util.py", line 224 in __call__
  File "/usr/lib/python3.8/multiprocessing/util.py", line 300 in _run_finalizers
  File "/usr/lib/python3.8/multiprocessing/util.py", line 334 in _exit_function
  File "/usr/lib/python3.8/multiprocessing/process.py", line 318 in _bootstrap
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 75 in _launch
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19 in __init__
  File "/usr/lib/python3.8/multiprocessing/context.py", line 277 in _Popen
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121 in start
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 326 in _repopulate_pool_static
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 303 in _repopulate_pool
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 212 in __init__
  File "/usr/lib/python3.8/multiprocessing/context.py", line 119 in Pool
  File ".../delphi_safegraph_patterns/run.py", line 108 in run_module
  File ".../tests/test_run.py", line 36 in test_output_files
  File "/usr/lib/python3.8/unittest/mock.py", line 1325 in patched
  File ".../env/lib/python3.8/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
  File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
  File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
  File ".../env/lib/python3.8/site-packages/_pytest/python.py", line 1641 in runtest
  File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
  File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
  File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
  File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 255 in <lambda>
  File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 311 in from_call
  File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 254 in call_runtest_hook
  File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 215 in call_and_report
  File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File ".../env/lib/python3.8/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
  File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
  File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
  File ".../env/lib/python3.8/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
  File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
  File ".../env/lib/python3.8/site-packages/_pytest/main.py", line 323 in _main
  File ".../env/lib/python3.8/site-packages/_pytest/main.py", line 269 in wrap_session
  File ".../env/lib/python3.8/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File ".../env/lib/python3.8/site-packages/pluggy/_callers.py", line 39 in _multicall
  File ".../env/lib/python3.8/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File ".../env/lib/python3.8/site-packages/pluggy/_hooks.py", line 265 in __call__
  File ".../env/lib/python3.8/site-packages/_pytest/config/__init__.py", line 162 in main
  File ".../env/lib/python3.8/site-packages/_pytest/config/__init__.py", line 185 in console_main
  File "../env/bin/pytest", line 8 in <module>
nedbat commented 2 years ago

Here are the complete steps I use to reproduce this problem. Note that the branch provided installs the working 6.2 version of coverage, so you need to explicitly install the broken 6.3 version, or whatever possible-fix version you want. The output of the commands are in the collapsed details sections.

Also note that there is sometimes a warning about a non-coverage data file, which is another symptom of the underlying race condition.

Run these steps in a Python 3.8 environment:

git clone https://github.com/vertexproject/synapse ``` Cloning into 'synapse'... remote: Enumerating objects: 51840, done. remote: Counting objects: 100% (573/573), done. remote: Compressing objects: 100% (360/360), done. remote: Total 51840 (delta 370), reused 328 (delta 207), pack-reused 51267 Receiving objects: 100% (51840/51840), 20.71 MiB | 5.73 MiB/s, done. Resolving deltas: 100% (41441/41441), done. ```
cd synapse ``` ```
git switch dev_test_covfix ``` branch 'dev_test_covfix' set up to track 'origin/dev_test_covfix' by rebasing. Switched to a new branch 'dev_test_covfix' ```
pip install -r requirements_dev.txt ``` Collecting pyOpenSSL<22.0.0,>=21.0.0 Using cached pyOpenSSL-21.0.0-py2.py3-none-any.whl (55 kB) Collecting msgpack<1.1.0,>=1.0.2 Using cached msgpack-1.0.3-cp38-cp38-macosx_10_9_x86_64.whl (73 kB) Collecting xxhash<2.1.0,>=1.4.4 Using cached xxhash-2.0.2-cp38-cp38-macosx_10_9_x86_64.whl (31 kB) Collecting lmdb<1.4.0,>=1.2.1 Using cached lmdb-1.3.0-cp38-cp38-macosx_10_14_x86_64.whl (101 kB) Collecting tornado<7.0.0,>=6.1.0 Using cached tornado-6.1-cp38-cp38-macosx_10_9_x86_64.whl (416 kB) Collecting regex>=2021.7.1 Using cached regex-2022.1.18-cp38-cp38-macosx_10_9_x86_64.whl (288 kB) Collecting PyYAML<6.1.0,>=5.4 Using cached PyYAML-6.0-cp38-cp38-macosx_10_9_x86_64.whl (192 kB) Collecting aiohttp<4.0,>=3.8.1 Using cached aiohttp-3.8.1-cp38-cp38-macosx_10_9_x86_64.whl (574 kB) Collecting aiohttp-socks<0.7.0,>=0.6.1 Using cached aiohttp_socks-0.6.1-py3-none-any.whl (9.3 kB) Collecting aioimaplib<0.10.0,>=0.9.0 Using cached aioimaplib-0.9.0-py3-none-any.whl (27 kB) Collecting aiosmtplib<2.0,>=1.1.6 Using cached aiosmtplib-1.1.6-py3-none-any.whl (29 kB) Collecting prompt-toolkit<3.1.0,>=3.0.4 Downloading prompt_toolkit-3.0.26-py3-none-any.whl (375 kB) |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 375 kB 2.6 MB/s Collecting lark-parser==0.11.2 Using cached lark_parser-0.11.2-py2.py3-none-any.whl Collecting Pygments<2.8.0,>=2.7.4 Using cached Pygments-2.7.4-py3-none-any.whl (950 kB) Collecting fastjsonschema<2.15,>=2.14.3 Using cached fastjsonschema-2.14.5-py3-none-any.whl (21 kB) Collecting packaging<22.0,>=20.0 Using cached packaging-21.3-py3-none-any.whl (40 kB) Collecting stix2-validator<4.0.0,>=3.0.0 Using cached stix2_validator-3.0.2-py2.py3-none-any.whl (248 kB) Collecting vcrpy<4.2.0,>=4.1.1 Using cached vcrpy-4.1.1-py2.py3-none-any.whl (40 kB) Collecting base58<2.2.0,>=2.1.0 Using cached base58-2.1.1-py3-none-any.whl (5.6 kB) Collecting python-bitcoinlib==0.11.0 Using cached python_bitcoinlib-0.11.0-py3-none-any.whl (103 kB) Collecting pycryptodome<3.14.0,>=3.11.0 Downloading pycryptodome-3.13.0-cp35-abi3-macosx_10_9_x86_64.whl (1.5 MB) |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.5 MB 8.1 MB/s Collecting typing-extensions<5.0.0,>=3.7.4 Using cached typing_extensions-4.0.1-py3-none-any.whl (22 kB) Collecting scalecodec<1.0.29,>=1.0.2 Downloading scalecodec-1.0.28-py3-none-any.whl (74 kB) |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 74 kB 6.2 MB/s Collecting cbor2<5.4.3,>=5.4.1 Downloading cbor2-5.4.2.post1-cp38-cp38-macosx_10_9_x86_64.whl (59 kB) |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 59 kB 8.3 MB/s Collecting bech32==1.2.0 Using cached bech32-1.2.0-py3-none-any.whl (4.6 kB) Collecting oauthlib<4.0.0,>=3.1.1 Using cached oauthlib-3.1.1-py2.py3-none-any.whl (146 kB) Collecting idna==3.3 Using cached idna-3.3-py3-none-any.whl (61 kB) Collecting python-dateutil<3.0,>=2.8 Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) Collecting cryptography<39.0.0,>=36.0.0 Using cached cryptography-36.0.1-cp36-abi3-macosx_10_10_x86_64.whl (2.6 MB) Collecting pytest<7.0.0,>=6.2.5 Using cached pytest-6.2.5-py3-none-any.whl (280 kB) Collecting autopep8<2.0.0,>=1.5.4 Using cached autopep8-1.6.0-py2.py3-none-any.whl (45 kB) Collecting pytest-cov<4.0.0,>=3.0.0 Using cached pytest_cov-3.0.0-py3-none-any.whl (20 kB) Collecting pycodestyle<3.0.0,>=2.8.0 Using cached pycodestyle-2.8.0-py2.py3-none-any.whl (42 kB) Collecting bump2version<1.1.0,>=1.0.1 Using cached bump2version-1.0.1-py2.py3-none-any.whl (22 kB) Collecting pytest-xdist<3.0.0,>=2.4.0 Using cached pytest_xdist-2.5.0-py3-none-any.whl (41 kB) Collecting coverage<6.3,>=6.2 Using cached coverage-6.2-cp38-cp38-macosx_10_9_x86_64.whl (179 kB) Collecting six>=1.5.2 Using cached six-1.16.0-py2.py3-none-any.whl (11 kB) Collecting yarl<2.0,>=1.0 Using cached yarl-1.7.2-cp38-cp38-macosx_10_9_x86_64.whl (121 kB) Collecting frozenlist>=1.1.1 Using cached frozenlist-1.3.0-cp38-cp38-macosx_10_9_x86_64.whl (36 kB) Collecting aiosignal>=1.1.2 Using cached aiosignal-1.2.0-py3-none-any.whl (8.2 kB) Collecting attrs>=17.3.0 Using cached attrs-21.4.0-py2.py3-none-any.whl (60 kB) Collecting multidict<7.0,>=4.5 Using cached multidict-6.0.2-cp38-cp38-macosx_10_9_x86_64.whl (28 kB) Collecting charset-normalizer<3.0,>=2.0 Using cached charset_normalizer-2.0.10-py3-none-any.whl (39 kB) Collecting async-timeout<5.0,>=4.0.0a3 Using cached async_timeout-4.0.2-py3-none-any.whl (5.8 kB) Collecting python-socks[asyncio]<2.0.0,>=1.2.4 Using cached python_socks-1.2.4-py3-none-any.whl (35 kB) Collecting wcwidth Using cached wcwidth-0.2.5-py2.py3-none-any.whl (30 kB) Collecting pyparsing!=3.0.5,>=2.0.2 Using cached pyparsing-3.0.7-py3-none-any.whl (98 kB) Collecting colorama Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB) Collecting appdirs Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB) Collecting jsonschema[format_nongpl]>=3.2.0 Using cached jsonschema-4.4.0-py3-none-any.whl (72 kB) Collecting stix2-patterns>=0.4.1 Using cached stix2_patterns-1.3.2-py2.py3-none-any.whl (65 kB) Collecting requests-cache Using cached requests_cache-0.9.1-py3-none-any.whl (46 kB) Collecting requests Using cached requests-2.27.1-py2.py3-none-any.whl (63 kB) Collecting simplejson Using cached simplejson-3.17.6-cp38-cp38-macosx_10_9_x86_64.whl (74 kB) Collecting cpe Using cached cpe-1.2.1-py3-none-any.whl Collecting wrapt Using cached wrapt-1.13.3-cp38-cp38-macosx_10_9_x86_64.whl (33 kB) Collecting more-itertools Using cached more_itertools-8.12.0-py3-none-any.whl (54 kB) Collecting cffi>=1.12 Using cached cffi-1.15.0-cp38-cp38-macosx_10_9_x86_64.whl (178 kB) Collecting toml Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB) Collecting py>=1.8.2 Using cached py-1.11.0-py2.py3-none-any.whl (98 kB) Collecting pluggy<2.0,>=0.12 Using cached pluggy-1.0.0-py2.py3-none-any.whl (13 kB) Collecting iniconfig Using cached iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB) Collecting coverage[toml]>=5.2.1 Using cached coverage-6.3-cp38-cp38-macosx_10_9_x86_64.whl (182 kB) Collecting execnet>=1.1 Using cached execnet-1.9.0-py2.py3-none-any.whl (39 kB) Collecting pytest-forked Using cached pytest_forked-1.4.0-py3-none-any.whl (4.9 kB) Collecting pycparser Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB) Collecting tomli Using cached tomli-2.0.0-py3-none-any.whl (12 kB) Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 Using cached pyrsistent-0.18.1-cp38-cp38-macosx_10_9_universal2.whl (81 kB) Collecting importlib-resources>=1.4.0 Using cached importlib_resources-5.4.0-py3-none-any.whl (28 kB) Collecting rfc3986-validator>0.1.0 Using cached rfc3986_validator-0.1.1-py2.py3-none-any.whl (4.2 kB) Collecting rfc3339-validator Using cached rfc3339_validator-0.1.4-py2.py3-none-any.whl (3.5 kB) Collecting uri-template Using cached uri_template-1.1.0-py3-none-any.whl (10 kB) Collecting jsonpointer>1.13 Using cached jsonpointer-2.2-py2.py3-none-any.whl (7.5 kB) Collecting fqdn Using cached fqdn-1.5.1-py3-none-any.whl (9.1 kB) Collecting isoduration Using cached isoduration-20.11.0-py3-none-any.whl (11 kB) Collecting webcolors>=1.11 Using cached webcolors-1.11.1-py3-none-any.whl (9.9 kB) Collecting certifi>=2017.4.17 Using cached certifi-2021.10.8-py2.py3-none-any.whl (149 kB) Collecting urllib3<1.27,>=1.21.1 Using cached urllib3-1.26.8-py2.py3-none-any.whl (138 kB) Collecting antlr4-python3-runtime~=4.8.0 Using cached antlr4_python3_runtime-4.8-py3-none-any.whl Collecting url-normalize<2.0,>=1.4 Using cached url_normalize-1.4.3-py2.py3-none-any.whl (6.8 kB) Collecting cattrs<2.0,>=1.8 Using cached cattrs-1.10.0-py3-none-any.whl (29 kB) Collecting zipp>=3.1.0 Using cached zipp-3.7.0-py3-none-any.whl (5.3 kB) Collecting arrow>=0.15.0 Downloading arrow-1.2.2-py3-none-any.whl (64 kB) |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 64 kB 7.8 MB/s Installing collected packages: six, zipp, python-dateutil, pyparsing, urllib3, toml, pyrsistent, pycparser, py, pluggy, packaging, multidict, iniconfig, importlib-resources, idna, frozenlist, charset-normalizer, certifi, attrs, arrow, yarl, webcolors, url-normalize, uri-template, tomli, rfc3986-validator, rfc3339-validator, requests, python-socks, pytest, jsonschema, jsonpointer, isoduration, fqdn, coverage, cffi, cattrs, async-timeout, appdirs, antlr4-python3-runtime, aiosignal, wrapt, wcwidth, stix2-patterns, simplejson, requests-cache, PyYAML, pytest-forked, pycodestyle, more-itertools, execnet, cryptography, cpe, colorama, base58, aiohttp, xxhash, vcrpy, typing-extensions, tornado, stix2-validator, scalecodec, regex, python-bitcoinlib, pytest-xdist, pytest-cov, pyOpenSSL, Pygments, pycryptodome, prompt-toolkit, oauthlib, msgpack, lmdb, lark-parser, fastjsonschema, cbor2, bump2version, bech32, autopep8, aiosmtplib, aioimaplib, aiohttp-socks Successfully installed PyYAML-6.0 Pygments-2.7.4 aiohttp-3.8.1 aiohttp-socks-0.6.1 aioimaplib-0.9.0 aiosignal-1.2.0 aiosmtplib-1.1.6 antlr4-python3-runtime-4.8 appdirs-1.4.4 arrow-1.2.2 async-timeout-4.0.2 attrs-21.4.0 autopep8-1.6.0 base58-2.1.1 bech32-1.2.0 bump2version-1.0.1 cattrs-1.10.0 cbor2-5.4.2.post1 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 colorama-0.4.4 coverage-6.2 cpe-1.2.1 cryptography-36.0.1 execnet-1.9.0 fastjsonschema-2.14.5 fqdn-1.5.1 frozenlist-1.3.0 idna-3.3 importlib-resources-5.4.0 iniconfig-1.1.1 isoduration-20.11.0 jsonpointer-2.2 jsonschema-4.4.0 lark-parser-0.11.2 lmdb-1.3.0 more-itertools-8.12.0 msgpack-1.0.3 multidict-6.0.2 oauthlib-3.1.1 packaging-21.3 pluggy-1.0.0 prompt-toolkit-3.0.26 py-1.11.0 pyOpenSSL-21.0.0 pycodestyle-2.8.0 pycparser-2.21 pycryptodome-3.13.0 pyparsing-3.0.7 pyrsistent-0.18.1 pytest-6.2.5 pytest-cov-3.0.0 pytest-forked-1.4.0 pytest-xdist-2.5.0 python-bitcoinlib-0.11.0 python-dateutil-2.8.2 python-socks-1.2.4 regex-2022.1.18 requests-2.27.1 requests-cache-0.9.1 rfc3339-validator-0.1.4 rfc3986-validator-0.1.1 scalecodec-1.0.28 simplejson-3.17.6 six-1.16.0 stix2-patterns-1.3.2 stix2-validator-3.0.2 toml-0.10.2 tomli-2.0.0 tornado-6.1 typing-extensions-4.0.1 uri-template-1.1.0 url-normalize-1.4.3 urllib3-1.26.8 vcrpy-4.1.1 wcwidth-0.2.5 webcolors-1.11.1 wrapt-1.13.3 xxhash-2.0.2 yarl-1.7.2 zipp-3.7.0 ```
pip freeze | grep cov ``` coverage==6.2 pytest-cov==3.0.0 ```
pip install coverage==6.3 ``` Collecting coverage==6.3 Using cached coverage-6.3-cp38-cp38-macosx_10_9_x86_64.whl (182 kB) Installing collected packages: coverage Attempting uninstall: coverage Found existing installation: coverage 6.2 Uninstalling coverage-6.2: Successfully uninstalled coverage-6.2 Successfully installed coverage-6.3 ```
python -m pytest --cov synapse --cov-report= synapse/tests/test_cov_fail.py ``` ============================================================================ test session starts ============================================================================= platform darwin -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 rootdir: /private/tmp/bug1310/synapse plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1 item synapse/tests/test_cov_fail.py . [100%] /usr/local/virtualenvs/tmp-b738f1a6ca078f2/lib/python3.8/site-packages/coverage/data.py:129: CoverageWarning: Data file '/private/tmp/bug1310/synapse/.coverage.mitadms-mbp.lan.75723.037672' doesn't seem to be a coverage data file: cannot unpack non-iterable NoneType object data._warn(str(exc)) ============================================================================= 1 passed in 29.36s ============================================================================= ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/local/pyenv/pyenv/versions/3.8.12/lib/python3.8/concurrent/futures/thread.py", line 40, in _python_exit t.join() File "/usr/local/pyenv/pyenv/versions/3.8.12/lib/python3.8/threading.py", line 1011, in join self._wait_for_tstate_lock() File "/usr/local/pyenv/pyenv/versions/3.8.12/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock elif lock.acquire(block, timeout): KeyboardInterrupt ```
ionelmc commented 2 years ago

IMO you shouldn't be using multiprocessing's default fork mode, unless you're trying to really optimize something - in which situation you'd have to accept that you'll eventually have deadlocks caused by locks being improperly shared across forked processes.

I don't want to be negative here but maybe the best thing to do here is just disable support for or issue big fat warning if multiprocessing in fork mode is detected.

nedbat commented 2 years ago

I have a fix for the hang: it's because I used threading.Lock, then tried to re-lock a lock I already held in the signal handler. Changing it to threading.RLock fixes the hang. But I still see some issues related to sharing..

vEpiphyte commented 2 years ago

I've been able to cut down my test example to something that locks up "most of the time"; but most of the time just means "there is a race condition that is happening most of the time" :\

While we do use multiprocessing.pool in our code, we do initialize a forkserver pool during import time of our library. We do this early during import time on the MainProcess to avoid the forkserver from possibly inheriting any resources related to things like the asyncio loop. A snippet from synapse/lib/coro.py is shown below:

# shared process pool
forkpool = None
if multiprocessing.current_process().name == 'MainProcess':
    # only create the forkpool in the MainProcess...
    try:
        mpctx = multiprocessing.get_context('forkserver')
        max_workers = int(os.getenv('SYN_FORKED_WORKERS', 1))
        forkpool = concurrent.futures.ProcessPoolExecutor(mp_context=mpctx, max_workers=max_workers)
        atexit.register(forkpool.shutdown)
    except OSError as e:  # pragma: no cover
        logger.warning(f'Failed to init forkserver pool, fallback enabled: {e}', exc_info=True)

Changing the coverage/sqldata.py's Lock to a RLock does fix our issue. @tunetheweb I don't know if that's something you've got a test case that you could try out?

nedbat commented 2 years ago

If you want a version with RLock to install, you can use:

pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1310
tunetheweb commented 2 years ago

Just tried that branch and still seeing the same slowness and hanging for Python 3.7 unfortunately.

nedbat commented 2 years ago

@tunetheweb is there any way you can get us a reproducible example with your code?

tunetheweb commented 2 years ago

@barrywhart could you offer any clues here? Think you wrote the multprocessing part.

Dug up this code where most of the magic happens if that's any help:

def main():
    """Find all example SQL files, parse and create YAML files."""
    parse_success_examples, _ = get_parse_fixtures()
    with multiprocessing.Pool(multiprocessing.cpu_count()) as pool:
        for _ in pool.imap_unordered(generate_parse_fixture, parse_success_examples):
            pass

if __name__ == "__main__":
    main()
barrywhart commented 2 years ago

@tunetheweb: Can you clarify what command(s) are hanging? The code you mention above is from a standalone script that creates some test YAML files; it's not part of the main package nor the automated tests.

barrywhart commented 2 years ago

In the past, I have worked around forking issues with multiprocessing.Pool by using the third-party billiard.Pool instead. (FYI: Billiard is a major component of the well-known Celery queuing package.) I don't know the technical details, but AFAIK, it lets you completely avoid inheriting handles and other objects from the parent process.

https://pypi.org/project/billiard/

tunetheweb commented 2 years ago

The Test Suite - particularly on 3.7: https://github.com/tunetheweb/sqlfluff/runs/4998066032?check_suite_focus=true (but seems slower on other versions too).

Ideally we'd have some sort of smaller reproducible test case as the whole SQLFluff code base is quite large as you know!

barrywhart commented 2 years ago

@tunetheweb: In that case, the issue probably arises from this module: https://github.com/sqlfluff/sqlfluff/blob/main/src/sqlfluff/core/linter/runner.py

The MultiProcessRunner and MultiThreadRunner classes use multiprocessing.Pool and multiprocessing.dummy.Pool.

Based on some reading I did just now (see StackOverflow link below), I bet we could avoid this issue by using spawn or forkserver methods. (Presumably, the hang occurs when using the fork method.)

https://stackoverflow.com/questions/43818519/what-is-the-meaning-of-context-argument-in-multiprocessing-pool-pool/43818829

nedbat commented 2 years ago

@barrywhart can you give me a way to reproduce the issue myself?

tunetheweb commented 2 years ago

If you fork the SQLFluff repo and then open a PR in your fork, to your fork’s main branch to unpin the coverage version in requirements_dev.txt and then the CI tests will run in GitHub Actions. Example in my own fork: https://github.com/tunetheweb/sqlfluff/pull/6/files

Most of the GitHub Actions will complete within 10 mins.

The Python 3.7 tests will run twice as slowly for some reasons, and after 20 mins will reach 90% and then not go any further. It will then hang for ever and eventually GitHub will cancel it if you don’t.

the fact it’s slow (rather than doesn’t work), and the fact it reaches 90% every time and then hangs is odd to me.

It’s consistently reproducible like that. The other Python versions I have seen issues with slowness and/or hanging much more occasionally but with 3.7 it’s every time.

The difficulty is getting a smaller reproducible test case as the repo is quite big with a lot of code. However, if you just want to test some changes, the above will work. Was thinking of just creating branches with every comment between 6.2 and 6.3 until i got to the commit that introduced this as kind of at a loss how better to reproduce this, and give some more clues.

barrywhart commented 2 years ago

If you want to reproduce it locally, I think you could just:

Let us know if that reproduces the issue for you. If it's helpful, then we can try and narrow it down. As I mentioned above, the module that handles subprocesses is https://github.com/sqlfluff/sqlfluff/blob/main/src/sqlfluff/core/linter/runner.py.

nedbat commented 2 years ago

On my mac, I can run the sqlfluff test suite on 3.7 with either coverage.py==6.2 or ==6.3, and they both complete in the same amount of time (764s and 769s). Hmm.

barrywhart commented 2 years ago

@tunetheweb: Have you been able to reproduce the issue locally? I haven't yet.

I'm not that familiar with our CI builds, but IIUC, it's running approximately the following commands (extracted from tox.ini and .github/workflows/ci-tests.yml:

export SQLFLUFF_TESTENV=1
export COVERAGE_FILE=.coverage
coverage erase
python util.py clean-tests
pytest -vv -rs --cov=sqlfluff --cov-report=xml -n 2 test/
vEpiphyte commented 2 years ago

With sqlfluff a notable difference I see between the local example and the CI config is the inclusion of pytest-xdist via the -n 2 flag.

tunetheweb commented 2 years ago

With sqlfluff a notable difference I see between the local example and the CI config is the inclusion of pytest-xdist via the -n 2 flag.

Good spot. That's key, as that's what invokes our multiple threads.

@tunetheweb: Have you been able to reproduce the issue locally? I haven't yet.

No I've not, but can't seem to get a Python 3.7 env set up locally. I've also tried to to reduce the test cases in CI. I picked one that did not complete and reran it. But it completes when run on its own. So it's definitely like it's some load/race condition when it's got a big job to do and it seems to "lose" some jobs so never completes.

tunetheweb commented 2 years ago

Also using -n 4 might make it more likely (as will use 4 thread then). Or even higher.

barrywhart commented 2 years ago

The SQLFluff build runs many Python versions concurrently, as well as running the dbt tests concurrently with the core tests. I don't know if these all run in the same server or separate, but perhaps this contributes to the problem as well.

nedbat commented 2 years ago

I'm trying some proposed fixes. They reduce the likelihood of hangs, but it's not 100%: https://github.com/nedbat/sqlfluff/actions

tunetheweb commented 2 years ago

The SQLFluff build runs many Python versions concurrently, as well as running the dbt tests concurrently with the core tests. I don't know if these all run in the same server or separate, but perhaps this contributes to the problem as well.

They are on separate machines. Also I did a run with the others turned off and it still hung.

osma commented 2 years ago

The same problem affects the Annif codebase. It uses multiprocessing.Pool in forking mode. I originally reported the problems we had with GitHub Actions CI jobs here: https://github.com/NatLibFi/Annif/issues/564

I can now reproduce this locally every time on my machine. Let me know if I can help with a reproducible test case and/or testing potential solutions.

nedbat commented 2 years ago

@osma yes, a local reproduction case would be very helpful.

osma commented 2 years ago

OK, this is very much not a minimal test case, but at least it demonstrates the problem:

# checkout latest Annif code
git clone https://github.com/NatLibFi/Annif.git

# create a venv inside and activate it
mkdir Annif
python3 -m venv venv
. venv/bin/activate

# make sure we have latest pip & wheel
pip install -U pip wheel

# install Annif in editable mode, with dev dependencies
pip install -e .[dev]

# download NLTK punkt model as well if you don't have it from earlier
python -m nltk.downloader punkt

# run the MLLM part of the test suite with coverage 6.2
pip install coverage==6.2
pytest --cov=./ -v tests/test_backend_mllm.py
# final output: 10 passed in 31.35s

# now do the same with coverage 6.3
pip install coverage==6.3
pytest --cov=./ -v tests/test_backend_mllm.py
# on a 4-core machine (multiprocessing.cpu_count()==8): hangs practically every time
# on a 2-core machine (multiprocessing.cpu_count()==4): hangs maybe 1 out of 4 times

More specifically, the test that gets stuck is test_mllm_train which uses multiprocessing.Pool to preprocess training documents in parallel.

pietrodn commented 2 years ago

I have the same issue on the CI tests of a scientific library of ours using multiprocessing. Pinning coverage<6.3 makes the problem disappear. Unfortunately it would be hard to expose the minimal reproduction steps.

osma commented 2 years ago

Here's a pretty minimal test case that I cobbled together:

process.py:

#!/usr/bin/env python3

import multiprocessing

def calculate(num):
    for i in range(100000):
        num += num
    return num

def process(jobs=16, numbers=64):
    with multiprocessing.Pool(jobs) as pool:
        orig_numbers = range(numbers)
        new_numbers = pool.map(calculate, orig_numbers)
    return new_numbers

if __name__ == '__main__':
    numbers = process()
    print(numbers)

test_process.py:

import pytest
import process

def test_process():
    numbers = process.process()
    assert len(numbers) == 64

Install the dependencies (in a venv): pip install pytest pytest-cov coverage==6.3

Run the test with:

pytest --cov=./ test_process.py

On a 2-core machine (Ubuntu 20.04 amd64, Python 3.8.10) this hangs maybe 50% of the time. If it doesn't, it will finish in less than 10 seconds. With coverage 6.2, it works every time.

vEpiphyte commented 2 years ago

Osma's test case does not appear to lock up when using coveragepy's current master (aad5ece47bf12bceff4296516f23171a06b34bb5)

nedbat commented 2 years ago

I've released coverage 6.3.1 with the RLock fix. I'm not sure what to do with the rest of this issue...

nedbat commented 2 years ago

@benmwebb @anishathalye @BoboTiG @JannisNe @pietrodn @osma @apriha @pllim @KelSolaar @rosswhitfield @JoanFM @erykoff @haampie @Bultako @maxnoe @rlopezcoto @glemaitre @nmdefries @QuLogic @akihironitta @sir-sigurd @cliffckerr Would you try coverage 6.3.1 to see if it fixes your problems?

benmwebb commented 2 years ago

Would you try coverage 6.3.1 to see if it fixes your problems?

Doesn't appear to work for me, unfortunately:

QuLogic commented 2 years ago

Unfortunately not for us either https://github.com/matplotlib/matplotlib/runs/5031432498?check_suite_focus=true

But it only fails on Python 3.7, which we are soon about to drop support for... However, that is also the build that pins minimum versions for other things, so it might be something other than Python.

pietrodn commented 2 years ago

I'm using Python 3.9. With coverage==6.3.1 it does not hang, however it gives errors such as:

sqlite3.OperationalError: no such table: file

Here I posted full tracebacks.

I'm running the tests with this command:

 pytest \
--durations=10 \
--capture=no \
--cov-branch \
--cov-fail-under=0 \
--cov-report=html:coverage-reports \
--cov=my_package \
--junitxml=test-reports/junit.xml

The errors aren't there with 6.2.