Closed altendky closed 1 year ago
The exact same error occurred here. It first appeared 15 hours ago. We used coverage-7.0.1 before, and this error appeared after updating to coverage-7.0.2.
Here are the full logs:
Hmm, this is definitely a part of code changed between 7.0.1 and 7.0.2, but reading the 7.0.1 code, I would have expected it to raise a coverage.DataError: "Data file {!r} doesn't seem to be a coverage data file: {exc}". That error message has been mentioned a few times, but doesn't seem to be a current problem. I'm trying to understand how 7.0.1 manifested this race condition...
I see: when combining data files, if a file can't be read, we silently skip the file and leave it behind. But that was only a check for CoverageException. 7.0.2 changed this failure case from a DataError (derived from CoverageException) to an AssertionError.
I've made a small tweak on a branch, can you try it? It should restore the old behavior, but with more information shown when a file can't be combined:
pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1522#egg=coverage==0.0
Sure thing. I referenced the branch in https://github.com/Chia-Network/chia-blockchain/pull/14261/ with the workflow run at https://github.com/Chia-Network/chia-blockchain/actions/runs/3831281612. Though, with it fixed I'm not sure that we'll see anything since this was happening via pytest-cov. All the completed runs so far have passed, so it does seem to be back to previous behavior. To get some data, should we make it fail again explicitly and dump the info you want to a file and I can capture that in a CI artifact?
FWIW, every time I deal with pytest-cov I do think about getting rid of it. Hasn't bubbled to the top of my list yet though. :|
You have a complex action, so I'm having trouble seeing the difference. Where it had failed with an assert, we should now see a message about not being able to combine a data file. Is that happening?
It was happening inside the pytest run so I'm guessing that if your diagnostics are going to stdout that pytest is likely consuming them.
I can certainly just take it on myself to tweak your branch and get that info visible someway too, if you would like.
I'm curious about when the db is created and schema info written when using pytest-cov and pytest-xdist. Is that somehow happening at the end of the xdist worker lifetime and the main process is attempting to collect those DB files too soon?
+1
I am now also getting this error on 7.0.2
/builds/company/team/project-abc/venv/lib/python3.10/site-packages/coverage/data.py:163: CoverageWarning: Couldn't use data file '/builds/company/team/project-abc/.coverage.runner-npuy5rve-project-32085024-concurrent-3.3930.994265': database disk image is malformed
data._warn(str(exc))
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/_pytest/main.py", line 270, in wrap_session
INTERNALERROR> session.exitstatus = doit(config, session) or 0
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/_pytest/main.py", line 324, in _main
INTERNALERROR> config.hook.pytest_runtestloop(session=session)
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265, in __call__
INTERNALERROR> return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80, in _hookexec
INTERNALERROR> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 55, in _multicall
INTERNALERROR> gen.send(outcome)
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/pytest_cov/plugin.py", line 297, in pytest_runtestloop
INTERNALERROR> self.cov_controller.finish()
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/pytest_cov/engine.py", line 44, in ensure_topdir_wrapper
INTERNALERROR> return meth(self, *args, **kwargs)
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/pytest_cov/engine.py", line 247, in finish
INTERNALERROR> self.cov.combine()
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/coverage/control.py", line 791, in combine
INTERNALERROR> combine_parallel_data(
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/coverage/data.py", line 158, in combine_parallel_data
INTERNALERROR> new_data.read()
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/coverage/sqldata.py", line 815, in read
INTERNALERROR> with self._connect():
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/coverage/sqldata.py", line 342, in _connect
INTERNALERROR> self._open_db()
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/coverage/sqldata.py", line 285, in _open_db
INTERNALERROR> self._read_db()
INTERNALERROR> File "/builds/company/team/project-abc/venv/lib/python3.10/site-packages/coverage/sqldata.py", line 302, in _read_db
INTERNALERROR> assert row is not None
INTERNALERROR> AssertionError
We started hitting this as well with 7.0.2. Our test suite uses pytest-cov
and pytest-xdist
and some of the tests internally are testing code that uses multiprocessing.Pool
. The code is private.
Environment:
coverage 7.0.2
pytest 7.2.0
pytest-cov 4.0.0
pytest-mock 3.10.0
pytest-randomly 3.12.0
pytest-xdist 3.1.0
Python 3.10.9, running in the python:3.10
DockerHub image on a GitLab runner docker
executor.
I don't know if this is related or not, but under 7.0.1 and earlier we would see the following warnings in the CI logs, but they never caused an issue so we were ignoring them:
.nox/test-3-10/lib/python3.10/site-packages/coverage/data.py:151: CoverageWarning: Data file '.coverage.runner-b6rhuh6f-project-1440-concurrent-2.31582.244847' doesn't seem to be a coverage data file: cannot unpack non-iterable NoneType object
data._warn(str(exc))
.nox/test-3-10/lib/python3.10/site-packages/coverage/data.py:151: CoverageWarning: Couldn't use data file '.coverage.runner-b6rhuh6f-project-1440-concurrent-2.39471.069373': database disk image is malformed
data._warn(str(exc))
@johnthagen can you try the branch I mentioned above?
pip install git+https://github.com/nedbat/coveragepy@nedbat/fix-1522#egg=coverage==0.0
Is anyone seeing this problem without pytest-cov?
I wouldn't be surprised if it were even specific to pytest-cov with pytest-xdist. I'll try to ~find~ ~make~ conjure up some time to dig in that direction.
This is fixed in commit 1f34d8b0.
This is now released as part of coverage 7.0.3.
Thanks for the quick turnaround!
But, am I understanding correctly that the underlying event here relates to probably losing some coverage data from some of the xdist processes? Or something similar. I appreciate that the symptom is gone, but it seems like there's an issue for me to explore and address elsewhere.
I agree that there's a possible problem here. This fix gets us back to where 7.0.1 (and before) were.
I don't know if there is data loss, since no one has reported a problem like that. It might be that new empty data files are getting created when they shouldn't be.
@nedbat Wanted to confirm that 7.0.3 solved our issue and only the warnings listed here are shown, but coverage information is correct and the tests complete successfully:
Describe the bug
In some of our jobs with 7.0.2, and not with 7.0.1, we commonly, but not always, get this traceback at the end. I expect it is indicative of another issue related to pytest-cov and pytest-xdist... but, none the less, it is a change in a patch release of coverage that may interest you.
https://github.com/Chia-Network/chia-blockchain/actions/runs/3825220672/jobs/6508036405#step:16:558
To Reproduce How can we reproduce the problem? Please be specific. Don't link to a failing CI job. Answer the questions below:
coverage debug sys
is helpful.pip freeze
is helpful.Successfully installed Jinja2-3.1.2 MarkupSafe-2.1.1 PyYAML-6.0 Pygments-2.14.0 SecretStorage-3.3.3 aiofiles-22.1.0 aiohttp-3.8.3 aiohttp-cors-0.7.0 aiosignal-1.3.1 aiosqlite-0.17.0 altgraph-0.17.3 anyio-3.6.2 argon2-cffi-bindings-21.2.0 argon2_cffi-21.3.0 astroid-2.12.13 asttokens-2.2.1 async-timeout-4.0.2 attrs-22.2.0 backcall-0.2.0 bencode.py-4.0.0 bitstring-3.1.9 black-22.10.0 bleach-5.0.1 blspy-1.0.16 build-0.9.0 certifi-2022.12.7 cffi-1.15.1 cfgv-3.3.1 chardet-5.1.0 charset-normalizer-2.1.1 chia-blockchain-1.7.0b5.dev18 chia-rs-0.1.16 chiabip158-1.1 chiapos-1.0.11 chiavdf-1.0.8 click-8.1.3 clvm-0.9.7 clvm-tools-0.4.6 clvm-tools-rs-0.1.25 colorama-0.4.5 colorlog-6.7.0 commonmark-0.9.1 concurrent-log-handler-0.9.20 coverage-7.0.2 cryptography-38.0.3 decorator-5.1.1 diff-cover-7.3.0 dill-0.3.6 distlib-0.3.6 dnslib-0.9.23 dnspython-2.2.1 docutils-0.19 exceptiongroup-1.1.0 execnet-1.9.0 executing-1.2.0 filelock-3.8.0 flake8-6.0.0 frozenlist-1.3.3 identify-2.5.11 idna-3.4 importlib-metadata-6.0.0 iniconfig-1.1.1 ipython-8.7.0 isort-5.11.4 jaraco.classes-3.2.3 jedi-0.18.2 jeepney-0.8.0 keyring-23.9.3 keyrings.cryptfile-1.3.9 lazy-object-proxy-1.8.0 matplotlib-inline-0.1.6 mccabe-0.7.0 memory-profiler-0.61.0 more-itertools-9.0.0 multidict-6.0.4 mypy-0.991 mypy-extensions-0.4.3 nodeenv-1.7.0 packaging-21.3 parso-0.8.3 pathspec-0.10.3 pep517-0.13.0 pexpect-4.8.0 pickleshare-0.7.5 pkginfo-1.9.2 platformdirs-2.6.2 pluggy-1.0.0 portalocker-2.6.0 pre-commit-2.21.0 prompt-toolkit-3.0.36 psutil-5.9.1 ptyprocess-0.7.0 pure-eval-0.2.2 py3createtorrent-1.1.0 pycodestyle-2.10.0 pycparser-2.21 pycryptodome-3.16.0 pyflakes-3.0.1 pyinstaller-5.6.2 pyinstaller-hooks-contrib-2022.14 pylint-2.15.9 pyparsing-3.0.9 pytest-7.2.0 pytest-asyncio-0.20.3 pytest-cov-4.0.0 pytest-monitor-1.6.5 pytest-xdist-3.1.0 readme-renderer-37.3 requests-2.28.1 requests-toolbelt-0.10.1 rfc3986-2.0.0 rich-13.0.0 setproctitle-1.2.3 six-1.16.0 sniffio-1.3.0 sortedcontainers-2.4.0 stack-data-0.6.2 tomli-2.0.1 tomlkit-0.11.6 traitlets-5.8.0 twine-4.0.2 types-aiofiles-22.1.0.4 types-cryptography-3.3.23.2 types-pkg-resources-0.1.3 types-pyyaml-6.0.12.2 types-setuptools-65.6.0.2 typing-extensions-4.4.0 urllib3-1.26.13 virtualenv-20.17.1 watchdog-2.1.9 wcwidth-0.2.5 webencodings-0.5.1 wrapt-1.14.1 yarl-1.8.2 zipp-3.11.0 zstd-1.5.2.6
pytest --cov=chia --cov=tests --cov-config=.coveragerc --cov-report= --durations=10 -n 4 -m "not benchmark" tests/core/util/test_cached_bls.py tests/core/util/test_config.py tests/core/util/test_file_keyring_synchronization.py tests/core/util/test_files.py tests/core/util/test_jsonify.py tests/core/util/test_keychain.py tests/core/util/test_keyring_wrapper.py tests/core/util/test_lockfile.py tests/core/util/test_lru_cache.py tests/core/util/test_significant_bits.py tests/core/util/test_streamable.py
Expected behavior
No tracebacks as is the case with 7.0.1. Unless... it is deemed that this is an indication of a "real" problem as I expressed concern about above.
Additional context
https://github.com/Chia-Network/chia-blockchain/pull/14255 is where I am reverting to 7.0.1 for now and includes links to a couple other PRs and CI runs where I setup our CI to run the problematic jobs 60 times over to provide a clear "statistical" comparison of the zero failures with 7.0.1 vs., well, many with 7.0.2.
I looked at the refactor of
CoverageData._read_db()
at https://github.com/nedbat/coveragepy/compare/7.0.1..7.0.2#diff-e5185ca855df54af4f94c983f99b52cfad9ddc17386353d09acdfa994dd72a33R287 (you have to expand theLoad diff
message forcoverage/sqldata.py
since GitHub doesn't do it for you) and noticed that while this exception would have previously been caught by theexcept Exception as exc:
clause the unpacking has been moved such that it is no longer mis-detected as a missing table.Side note, this gave me a good chuckle.
https://github.com/nedbat/coveragepy/blob/7.0.2/CHANGES.rst#version-702--2023-01-02