Running more tests causes coverage to decrease

Describe the bug I have a rather complicated setup (see #1318) where running a full suite of tests reports many lines in many files as not covered when I know for certain they are in fact executed during the tests. In trying to narrow the scope of the issue, I discovered that running a small set of tests causes the reported coverage in a single, sample, source file to be 80% while running the full set of tests causes the coverage of that file to decrease to 53%.

To Reproduce Unfortunately, I cannot provide a repro, but I am certainly willing to hack the code at my end to try to progress the issue. To detail the test setup, if I run only the Django code with about 1 minute worth of tests like this:

touch .coveragerc.enabled                                 # Enable coverage using built-in calls to the coverage API.
./manage.py runserver --noreload 0.0.0.0:8000             # Run Django
# In another window...
pytest -slv --durations=0 test/test_suite26_upload.py     # Run tests
pkill -INT -f runserver                                   # Stop Django; dumps results using built-in calls to the coverage API.
coverage combine                                          # Get results
coverage html                                             # etc.

then the coverage reported of generic_upload.py is 80%. But if I run the Django server AND all the Celery-based code, and a full set of tests (circa 80 minutes), the coverage of generic_upload.py drops to 53%.

What version of Python are you using? Python 3.10.6
What version of coverage.py shows the problem? coverage.py v6.4.4

The output of coverage debug sys is helpful.

$ coverage debug sys
-- sys -------------------------------------------------------
           coverage_version: 6.4.4
            coverage_module: /usr/local/lib/python3.10/dist-packages/coverage/__init__.py
                     tracer: -none-
                    CTracer: available
       plugins.file_tracers: -none-
        plugins.configurers: -none-
  plugins.context_switchers: -none-
          configs_attempted: .coveragerc
               configs_read: /main/srhaque/bootstrap/source/.coveragerc
                config_file: /main/srhaque/bootstrap/source/.coveragerc
            config_contents: b'# .coveragerc to control coverage.py\n[run]\nbranch = True\nparallel = True\ninclude =\n    api\n    bin\n    paiyroll\n    project\n    test\n    *.py\nomit =\n    /tmp/*\n    /usr/*\n    */pycharm/*\n    */pydev/*\n    paiyroll/migrations/*\nconcurrency =\n    thread\n    multiprocessing\n'
                  data_file: -none-
                     python: 3.10.6 (main, Aug 10 2022, 11:40:04) [GCC 11.3.0]
                   platform: Linux-5.15.0-48-generic-x86_64-with-glibc2.35
             implementation: CPython
                 executable: /usr/bin/python3
               def_encoding: utf-8
                fs_encoding: utf-8
                        pid: 3344013
                        cwd: /main/srhaque/bootstrap/source
                       path: /usr/local/bin
                             /usr/lib/python310.zip
                             /usr/lib/python3.10
                             /usr/lib/python3.10/lib-dynload
                             /usr/local/lib/python3.10/dist-packages
                             /usr/lib/python3/dist-packages
                environment: HOME = /main/srhaque
               command_line: /usr/local/bin/coverage debug sys
            sqlite3_version: 2.6.0
     sqlite3_sqlite_version: 3.37.2
         sqlite3_temp_store: 0
    sqlite3_compile_options: ATOMIC_INTRINSICS=1, COMPILER=gcc-11.2.0, DEFAULT_AUTOVACUUM,
                             DEFAULT_CACHE_SIZE=-2000, DEFAULT_FILE_FORMAT=4,
                             DEFAULT_JOURNAL_SIZE_LIMIT=-1, DEFAULT_MMAP_SIZE=0, DEFAULT_PAGE_SIZE=4096,
                             DEFAULT_PCACHE_INITSZ=20, DEFAULT_RECURSIVE_TRIGGERS,
                             DEFAULT_SECTOR_SIZE=4096, DEFAULT_SYNCHRONOUS=2,
                             DEFAULT_WAL_AUTOCHECKPOINT=1000, DEFAULT_WAL_SYNCHRONOUS=2,
                             DEFAULT_WORKER_THREADS=0, ENABLE_COLUMN_METADATA, ENABLE_DBSTAT_VTAB,
                             ENABLE_FTS3, ENABLE_FTS3_PARENTHESIS, ENABLE_FTS3_TOKENIZER, ENABLE_FTS4,
                             ENABLE_FTS5, ENABLE_JSON1, ENABLE_LOAD_EXTENSION, ENABLE_MATH_FUNCTIONS,
                             ENABLE_PREUPDATE_HOOK, ENABLE_RTREE, ENABLE_SESSION, ENABLE_STMTVTAB,
                             ENABLE_UNLOCK_NOTIFY, ENABLE_UPDATE_DELETE_LIMIT, HAVE_ISNAN,
                             LIKE_DOESNT_MATCH_BLOBS, MALLOC_SOFT_LIMIT=1024, MAX_ATTACHED=10,
                             MAX_COLUMN=2000, MAX_COMPOUND_SELECT=500, MAX_DEFAULT_PAGE_SIZE=32768,
                             MAX_EXPR_DEPTH=1000, MAX_FUNCTION_ARG=127, MAX_LENGTH=1000000000,
                             MAX_LIKE_PATTERN_LENGTH=50000, MAX_MMAP_SIZE=0x7fff0000,
                             MAX_PAGE_COUNT=1073741823, MAX_PAGE_SIZE=65536, MAX_SCHEMA_RETRY=25,
                             MAX_SQL_LENGTH=1000000000, MAX_TRIGGER_DEPTH=1000,
                             MAX_VARIABLE_NUMBER=250000, MAX_VDBE_OP=250000000, MAX_WORKER_THREADS=8,
                             MUTEX_PTHREADS, OMIT_LOOKASIDE, SECURE_DELETE, SOUNDEX, SYSTEM_MALLOC,
                             TEMP_STORE=1, THREADSAFE=1, USE_URI

Expected behavior The coverage should NOT decrease when the full test set is run.

Additional context The full test set requires the full breadths of the code to run. This includes Django and Celery based code. The Celery-based code is using subprocesses (not the default fork-n-exec). There is all sorts of other stuff going on including: direct invocations of subprocesses, calling out to Java from some of those subprocesses, async calls via carehare to rabbitmq, use of packages such as lxml which contain C-based code, and probably other gnarly stuff.

I'm not sure where to begin debugging this via issue discussion, which is always difficult.

Can you add --debug=config,sys to your coverage run, and show the output?
Can you explain this line? touch .coveragerc.enabled # Enable coverage using built-in calls to the coverage API. What are "built-in calls to the coverage API," and why?
What is .coveragerc.enabled compared to the config file that was read (/main/srhaque/bootstrap/source/.coveragerc)?

I'm not sure where to begin debugging this via issue discussion, which is always difficult.

Understood.

* Can you add `--debug=config,sys` to your coverage run, and show the output?

$ export COVERAGE_DEBUG=config,sys
$ ./manage.py runserver --noreload 0.0.0.0:8000  
-- config ----------------------------------------------------
attempted_config_files: /main/srhaque/bootstrap/source/.coveragerc
branch: True
command_line: None
concurrency: thread
multiprocessing
config_file: /main/srhaque/bootstrap/source/.coveragerc
config_files_read: /main/srhaque/bootstrap/source/.coveragerc
context: None
cover_pylib: False
data_file: .coverage
debug: config
sys
disable_warnings: -none-
dynamic_context: None
exclude_list: #\s*(pragma|PRAGMA)[:\s]?\s*(no|NO)\s*(cover|COVER)
extra_css: None
fail_under: 0.0
html_dir: htmlcov
html_skip_covered: None
html_skip_empty: None
html_title: Coverage report
ignore_errors: False
json_output: coverage.json
json_pretty_print: False
json_show_contexts: False
lcov_output: coverage.lcov
note: None
parallel: True
partial_always_list: while (True|1|False|0):
if (True|1|False|0):
partial_list: #\s*(pragma|PRAGMA)[:\s]?\s*(no|NO)\s*(branch|BRANCH)
paths: OrderedDict()
plugin_options: {}
plugins: -none-
precision: 0
relative_files: False
report_contexts: None
report_include: None
report_omit: None
run_include: api
bin
paiyroll
project
test
*.py
run_omit: /tmp/*
/usr/*
*/pycharm/*
*/pydev/*
paiyroll/migrations/*
show_contexts: False
show_missing: False
sigterm: False
skip_covered: False
skip_empty: False
sort: None
source: None
source_pkgs: -none-
timid: False
xml_output: coverage.xml
xml_package_depth: 99
-- sys -------------------------------------------------------
coverage_version: 6.4.4
coverage_module: /usr/local/lib/python3.10/dist-packages/coverage/__init__.py
tracer: CTracer
CTracer: available
plugins.file_tracers: -none-
plugins.configurers: -none-
plugins.context_switchers: -none-
configs_attempted: /main/srhaque/bootstrap/source/.coveragerc
configs_read: /main/srhaque/bootstrap/source/.coveragerc
config_file: /main/srhaque/bootstrap/source/.coveragerc
config_contents: b'# .coveragerc to control coverage.py\n[run]\nbranch = True\nparallel = True\ninclude =\n    api\n    bin\n    paiyroll\n    project\n    test\n    *.py\nomit =\n    /tmp/*\n    /usr/*\n    */pycharm/*\n    */pydev/*\n    paiyroll/migrations/*\nconcurrency =\n    thread\n    multiprocessing\n'
data_file: /main/srhaque/bootstrap/source/.coverage.freenas.2422515.324701
python: 3.10.6 (main, Aug 10 2022, 11:40:04) [GCC 11.3.0]
platform: Linux-5.15.0-48-generic-x86_64-with-glibc2.35
implementation: CPython
executable: /usr/bin/python3
def_encoding: utf-8
fs_encoding: utf-8
pid: 2422515
cwd: /main/srhaque/bootstrap/source
path: /main/srhaque/bootstrap/source
/main/srhaque/bootstrap/source
/usr/lib/python310.zip
/usr/lib/python3.10
/usr/lib/python3.10/lib-dynload
/usr/local/lib/python3.10/dist-packages
/usr/lib/python3/dist-packages
environment: COVERAGE_DEBUG = config,sys
COVERAGE_RCFILE = /main/srhaque/bootstrap/source/.coveragerc
HOME = /main/srhaque
command_line: ./manage.py runserver --noreload 0.0.0.0:8000
coverage_paths: /usr/local/lib/python3.10/dist-packages/coverage
stdlib_paths: /usr/lib/python3.10
third_party_paths: /usr/local/lib/python3.10/dist-packages
/usr/lib/python
/main/srhaque/.local/lib/python3.10/site-packages
/usr/lib/python3.10/site-packages
/main/srhaque/.local/bin
/usr/local/bin
/usr/bin
source_match: -none-
source_pkgs_match: -none-
include_match: /main/srhaque/bootstrap/source/api
/main/srhaque/bootstrap/source/bin
/main/srhaque/bootstrap/source/paiyroll
/main/srhaque/bootstrap/source/project
/main/srhaque/bootstrap/source/test
*.py
omit_match: /tmp/*
/usr/*
*/pycharm/*
*/pydev/*
/main/srhaque/bootstrap/source/paiyroll/migrations/*
cover_match: /usr/local/lib/python3.10/dist-packages/coverage
pylib_match: /usr/lib/python3.10
third_match: /main/srhaque/.local/bin
/main/srhaque/.local/lib/python3.10/site-packages
/usr/bin
/usr/lib/python
/usr/lib/python3.10/site-packages
/usr/local/bin
/usr/local/lib/python3.10/dist-packages
sqlite3_version: 2.6.0
sqlite3_sqlite_version: 3.37.2
sqlite3_temp_store: 0
sqlite3_compile_options: ATOMIC_INTRINSICS=1, COMPILER=gcc-11.2.0, DEFAULT_AUTOVACUUM,
DEFAULT_CACHE_SIZE=-2000, DEFAULT_FILE_FORMAT=4,
DEFAULT_JOURNAL_SIZE_LIMIT=-1, DEFAULT_MMAP_SIZE=0, DEFAULT_PAGE_SIZE=4096,
DEFAULT_PCACHE_INITSZ=20, DEFAULT_RECURSIVE_TRIGGERS,
DEFAULT_SECTOR_SIZE=4096, DEFAULT_SYNCHRONOUS=2,
DEFAULT_WAL_AUTOCHECKPOINT=1000, DEFAULT_WAL_SYNCHRONOUS=2,
DEFAULT_WORKER_THREADS=0, ENABLE_COLUMN_METADATA, ENABLE_DBSTAT_VTAB,
ENABLE_FTS3, ENABLE_FTS3_PARENTHESIS, ENABLE_FTS3_TOKENIZER, ENABLE_FTS4,
ENABLE_FTS5, ENABLE_JSON1, ENABLE_LOAD_EXTENSION, ENABLE_MATH_FUNCTIONS,
ENABLE_PREUPDATE_HOOK, ENABLE_RTREE, ENABLE_SESSION, ENABLE_STMTVTAB,
ENABLE_UNLOCK_NOTIFY, ENABLE_UPDATE_DELETE_LIMIT, HAVE_ISNAN,
LIKE_DOESNT_MATCH_BLOBS, MALLOC_SOFT_LIMIT=1024, MAX_ATTACHED=10,
MAX_COLUMN=2000, MAX_COMPOUND_SELECT=500, MAX_DEFAULT_PAGE_SIZE=32768,
MAX_EXPR_DEPTH=1000, MAX_FUNCTION_ARG=127, MAX_LENGTH=1000000000,
MAX_LIKE_PATTERN_LENGTH=50000, MAX_MMAP_SIZE=0x7fff0000,
MAX_PAGE_COUNT=1073741823, MAX_PAGE_SIZE=65536, MAX_SCHEMA_RETRY=25,
MAX_SQL_LENGTH=1000000000, MAX_TRIGGER_DEPTH=1000,
MAX_VARIABLE_NUMBER=250000, MAX_VDBE_OP=250000000, MAX_WORKER_THREADS=8,
MUTEX_PTHREADS, OMIT_LOOKASIDE, SECURE_DELETE, SOUNDEX, SYSTEM_MALLOC,
TEMP_STORE=1, THREADSAFE=1, USE_URI
-- end -------------------------------------------------------

* Can you explain this line? `touch .coveragerc.enabled # Enable coverage using built-in calls to the coverage API.` What are "built-in calls to the coverage API," and why?
* What is `.coveragerc.enabled` compared to the config file that was read (`/main/srhaque/bootstrap/source/.coveragerc`)?

I should have explained. Since Django is a server process and is launched indirectly in several different ways (e.g. via Celery or the local equivalent of systemd), I wanted to be sure that coverage is invoked uniformly in all cases. So, I include calls to the coverage API to start/stop the instrumentation and use the presence of the .coveragerc.enabled as a "boolean" to enable this logic. (There is no actual content in the file).

For the short test, I run Django directly as indicated, but for the "full" test pass, several indirect methods are used.

It occurred to me that since the "missing" coverage data is definitely captured (in at least some cases), the problem might be in the gather of the multiple coverage data files. So, I thought I might try to dump the content of the data files to investigate further, using SQL access.

I am able to dump the file table using something like this:

c.execute("select * from file")
for row in c.fetchall():
    print(row)

However, I'm having trouble "join"ing the file table to the line_bits table. In trying to diagnose this, I backed off to try to dump the line_bits like this:

c.execute("select * from line_bits")
for row in c.fetchall():
    print(row)

But try as I might, this query never returns any rows, even for a coverage file with plenty of data. What am I doing wrong? Is there an example of the SQL needed to dump the join of the file and line_bits tables?

Silly me: I finally worked out that the "line_bits" and "arc" tables are alternatives. So on to the next question... My dumping code now looks like this:

c.execute("select f.path, a.fromno, a.tono from arc as a, file as f where a.file_id = f.id order by f.path, a.fromno")
while True:
    row = c.fetchone()
    print(row)

and produces output like this:

('/.../command_line.py', -435, 455)
('/.../command_line.py', -431, 431)
('/.../command_line.py', -423, 424)
('/.../command_line.py', -347, 351)
('/.../command_line.py', -67, 68)
('/.../command_line.py', -60, 61)
('/.../command_line.py', -46, 47)
('/.../command_line.py', 47, 48)
('/.../command_line.py', 48, 50)
('/.../command_line.py', 50, -46)
('/.../command_line.py', 61, 62)
('/.../command_line.py', 62, 64)

What is the significance of the negative numbers?

Answering my own question: https://stackoverflow.com/a/37312317/6332554 explains the negative line numbers.

At least part of the problem, at its root, is that Celery workers do not honour atexit() handlers, see https://github.com/celery/celery/discussions/8923.

I finally ran this to ground. It turns out that Django does not reliably honour atexit() handlers. So in this case, in small-scale test runs, Coverage was being collected, but in full-scale test runs, they were not.

FWIW, to track this down:

To my Coverage start logic, I added recording of the process PID and sys.argv.
I added intermittent Coverage dump stats and restart collection, organised by PID-named files.
After the run, I was able to see that some PIDs did not (always) have dumps files. Since I knew what the sys.argv was by PID, I was able to see the failing processes.

Many thanks for the kind support.

nedbat / coveragepy

Running more tests causes coverage to decrease #1454