nedbat / coveragepy

The code coverage tool for Python
https://coverage.readthedocs.io
Apache License 2.0
2.95k stars 426 forks source link

Want to get coverage for 3rd party dependencies' code used by my project. #1759

Open amaranthjinn opened 5 months ago

amaranthjinn commented 5 months ago

@nedbat

Question: What do I need to do to allow coverage to run against the 3rd party dependencies installed in the .virtualenv's site packages directory? I see that 3rd party dependency coverage is removed at https://github.com/nedbat/coveragepy/commit/0285af966a3942d8bd63489bd285328e96221126. However, I want to run coverage against my project code and see what lines are triggered in the 3rd party dependencies' code. The usage I'm imagining is: I have some app code in my project/src directory, A.py, which imports 3rd party dependencies B and C. B and C are deployed to the virtual env's site packages directory (outside of the project directory). A_test.py is written to exercise A.py (A_test.py in the same directory as A.py). Run coverage against A_test.py can show the % of code (and the lines of code ) triggered in B and C. I see -L is used to toggle coverage for stdlib, is there some similar way to toggle coverage for the 3rd party dependencies?

I have the project using Poetry to manage its dependencies. The project structure: home/xiaojin/Workspace/p ├── project.toml ├── requirements.txt ├── src |---- p ├── poetry.lock ...

and virtual environment's site packages directory at: home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages |---- numpy |----matlib.py ... My project uses a number of 3rd party dependencies, which are deployed to the .virtualenvs's site-packages.

I ran coverage against some pytest in the project folder, but seems like the coverage only covers src and opt directory (by design it seems, https://nedbatchelder.com/blog/202104/coveragepy_and_thirdparty_code.html, btw, thank you for the prompt response).

from the python virtual env, I ran a coverage command in home/xiaojin/Workspace/p folder: coverage run -m pytest /usr/.../Workspace/p/xxj_test.py where xxj_test simply imports numpy, to test if the coverage can run against 3rd party dependencies in the .virtualenv directory (where numpy is installed). However, the coverage returned includes only src, opt directories, and files under the project p directory.

I tried with -L, stdlib dependencies are added.

I tried with --include 'venv/*', got error: ===================================================== test session starts ===================================================== platform linux -- Python 3.11.7, pytest-7.1.1, pluggy-1.0.0 benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) PyQt5 5.14.2 -- Qt runtime 5.14.2 -- Qt compiled 5.14.2 rootdir: /usr/local/home/xiaojin/Workspace/p, configfile: project.toml plugins: benchmark-3.4.1, xdist-2.5.0, forked-1.4.0, pyfakefs-4.3.3, typeguard-2.13.3, dash-2.14.2, anyio-3.5.0, cov-3.0.0, qt-4.0.2, flaky-3.7.0, timeout-2.1.0, asyncio-0.19.0 timeout: 450.0s timeout method: signal timeout func_only: False asyncio: mode=Mode.STRICT collected 0 items

==================================================== no tests ran in 0.20s ==================================================== /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/coverage/control.py:793: CoverageWarning: No data was collected. (no-data-collected) self._warn("No data was collected.", slug="no-data-collected")

Other tries:

  1. ran coverage command from within the .virtualenv directory, no difference.
  2. ran coverage debug sys:

-- sys ------------------------------------------------------- coverage_version: 6.3.2 coverage_module: /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/coverage/init.py tracer: -none- CTracer: unavailable plugins.file_tracers: -none- plugins.configurers: -none- plugins.context_switchers: -none- configs_attempted: .coveragerc setup.cfg tox.ini pyproject.toml configs_read: -none- config_file: None config_contents: -none- data_file: -none- python: 3.11.7 (main, Jan 14 2024, 00:38:57) [GCC 10.2.1 20210110] platform: Linux-6.6.13-1rodete3-amd64-x86_64-with-glibc2.37 implementation: CPython executable: /usr/local/../xiaojin/.virtualenvs/p/bin/python3 def_encoding: utf-8 fs_encoding: utf-8 pid: 986639 cwd: /usr/local/../xiaojin/Workspace/p/users/xiaojin path: /usr/local/../xiaojin/.virtualenvs/p/bin /usr/local/../xiaojin/Workspace/p/src /usr/local/buildtools/current/sitecustomize /opt/python3.11.7/lib/python311.zip /opt/python3.11.7/lib/python3.11 /opt/python3.11.7/lib/python3.11/lib-dynload /usr/local/../xiaojin/.virtualenvs/p/lib/python3.11/site-packages environment: HOME = /usr/local/home/xiaojin PYTHONPATH = /usr/local/buildtools/current/sitecustomize VIRTUALENVWRAPPER_PYTHON = /usr/bin/python3 command_line: /usr/local/../xiaojin/.virtualenvs/p/bin/coverage debug sys sqlite3_version: 2.6.0 sqlite3_sqlite_version: 3.45.0 sqlite3_temp_store: 0 sqlite3_compile_options: ATOMIC_INTRINSICS=1; COMPILER=gcc-13.2.0; DEFAULT_AUTOVACUUM DEFAULT_CACHE_SIZE=-2000; DEFAULT_FILE_FORMAT=4; DEFAULT_JOURNAL_SIZE_LIMIT=-1 DEFAULT_MMAP_SIZE=0; DEFAULT_PAGE_SIZE=4096; DEFAULT_PCACHE_INITSZ=20 DEFAULT_RECURSIVE_TRIGGERS; DEFAULT_SECTOR_SIZE=4096; DEFAULT_SYNCHRONOUS=2 DEFAULT_WAL_AUTOCHECKPOINT=1000; DEFAULT_WAL_SYNCHRONOUS=2; DEFAULT_WORKER_THREADS=0 DIRECT_OVERFLOW_READ; ENABLE_COLUMN_METADATA; ENABLE_DBSTAT_VTAB ENABLE_FTS3; ENABLE_FTS3_PARENTHESIS; ENABLE_FTS3_TOKENIZER ENABLE_FTS4; ENABLE_FTS5; ENABLE_LOAD_EXTENSION ENABLE_MATH_FUNCTIONS; ENABLE_PREUPDATE_HOOK; ENABLE_RTREE ENABLE_SESSION; ENABLE_STMTVTAB; ENABLE_UNLOCK_NOTIFY ENABLE_UPDATE_DELETE_LIMIT; HAVE_ISNAN; LIKE_DOESNT_MATCH_BLOBS MALLOC_SOFT_LIMIT=1024; MAX_ATTACHED=10; MAX_COLUMN=2000 MAX_COMPOUND_SELECT=500; MAX_DEFAULT_PAGE_SIZE=32768; MAX_EXPR_DEPTH=1000 MAX_FUNCTION_ARG=127; MAX_LENGTH=1000000000; MAX_LIKE_PATTERN_LENGTH=50000 MAX_MMAP_SIZE=0x7fff0000; MAX_PAGE_COUNT=0xfffffffe; MAX_PAGE_SIZE=65536 MAX_SCHEMA_RETRY=25; MAX_SQL_LENGTH=1000000000; MAX_TRIGGER_DEPTH=1000 MAX_VARIABLE_NUMBER=250000; MAX_VDBE_OP=250000000; MAX_WORKER_THREADS=8 MUTEX_PTHREADS; SECURE_DELETE; SOUNDEX SYSTEM_MALLOC; TEMP_STORE=1; THREADSAFE=1 USE_URI

  1. python -m coverage run -m pytest /usr/local/home/xiaojin/Workspace/p/xxj_test.py --debug=trace

writing pytest debug information to trace ================================================================================================================================ test session starts ================================================================================================================================ platform linux -- Python 3.11.7, pytest-7.1.1, pluggy-1.0.0 -- /usr/local/home/xiaojin/.virtualenvs/p/bin/python using: pytest-7.1.1 setuptools registered plugins: pytest-benchmark-3.4.1 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_benchmark/plugin.py pytest-xdist-2.5.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/xdist/plugin.py pytest-xdist-2.5.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/xdist/looponfail.py pytest-forked-1.4.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_forked/init.py pyfakefs-4.3.3 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pyfakefs/pytest_plugin.py typeguard-2.13.3 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/typeguard/pytest_plugin.py dash-2.14.2 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/dash/testing/plugin.py anyio-3.5.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/anyio/pytest_plugin.py pytest-cov-3.0.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_cov/plugin.py pytest-qt-4.0.2 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytestqt/plugin.py flaky-3.7.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/flaky/flaky_pytest_plugin.py pytest-timeout-2.1.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_timeout.py pytest-asyncio-0.19.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_asyncio/plugin.py benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) PyQt5 5.14.2 -- Qt runtime 5.14.2 -- Qt compiled 5.14.2 rootdir: /usr/local/home/xiaojin/Workspace/p, configfile: pyproject.toml plugins: benchmark-3.4.1, xdist-2.5.0, forked-1.4.0, pyfakefs-4.3.3, typeguard-2.13.3, dash-2.14.2, anyio-3.5.0, cov-3.0.0, qt-4.0.2, flaky-3.7.0, timeout-2.1.0, asyncio-0.19.0 timeout: 450.0s timeout method: signal timeout func_only: False asyncio: mode=Mode.STRICT collected 0 items

  1. coverage run -m --source=numpy pytest /usr/local/google/home/xiaojin/Workspace/pyle/xxj_test.py coverage ran against all files in the virtualenvs numpy folder, but doesn't show coverage for xxj_test.py?
amaranthjinn commented 5 months ago

It seems like the following command is close to what I want to run: coverage run --source=/usr/local/home/xiaojin/Workspace/pyle,/usr/local/home/xiaojin/.virtualenvs/pyle/lib64/python3.11/site-packages deps_analysis.py --debug=trace

where deps_analysis.py is the following just to test out what coverage can return: import numpy # test if coverage covers site package directory of 3rd party installations print(numpy.file)

the result does return coverage of files in the .virtualenvs/.../site-packages, but I'm not clear how to interpret:

  1. it includes other .py' in the home/xiaojin/Workspace/pyle, while I imagine it should only be the deps_analysis.py from the repo directory (its only dependency is numpy, which is in .virtualenvs/). Why is coverage running against those other .py' files?
  2. there are a lot more dependencies from the .virtualenvs than expected, many with coverage 0%, so I'm not sure if they are included because they are really dependencies of numpy or all 3rd party dependencies in the .virtualenvs/ are scraped by Coverage?

Would appreciate any clarifications on how to use the coverage tool correctly and effectively.

amaranthjinn commented 4 months ago

@nedbat , I'm able to update the coverage to the latest version. However, I'm having the same issue with the report. With --source specified, Coverage report shows a lot of files with 0 lines which give 100% of coverage, or no lines covered (0%), and coverage for seems like every file in the directories. However, with no site-packages directory specified in the source, the report will not show any package coverage in the 3rd party's site-packages folder. I was hoping to see coverage for deps_analysis.py + its direct/indirect dependencies (3rd party). What's the recommended way to go about it?

nedbat commented 4 months ago

Coverage report shows a lot of files with 0 lines which give 100% of coverage, or no lines covered (0%)

You can exclude empty files (skip_empty), and 100% files (skip_covered) if they are distracting you.

I don't understand if you are getting measurement of third-party packages you aren't interested in though?

amaranthjinn commented 4 months ago

Yeah, I seem to be getting measurements of packages I'm not interested in, from both project src and 3rd party package directories. Here's the scenario I'm trying: I wrote a dummy test in the project src directory that only does import numpy (deployed in the 3rd party package directory), and print. I ran coverage against the dummy test. I was expecting coverage of the dummy test, numpy, and maybe some other basic libraries for printing. Instead, I got coverage for all the files in src directory, and all the files in the 3rd party package directory, regardless of whether they are dependencies of the dummy test or not.

My command is something like: coverage run --source=/usr/local/google/home/Workspace/p,/usr/local/google/home/.virtualenvs/p/lib64/python3.11/site-packages /usr/local/google/home/Workspace/p/src/dummy_test.py --debug=trace

and then: coverage html --ignore-errors --skip-empty

The behavior seems to be, with --source specified, coverage is running against all files in the listed directories? However, without --source specified, coverage run against the dummy test will only show coverage for dummy test in the src directory, nothing from the 3rd party dependency directory (I want to see the coverage for numpy).

I was hoping to use the coverage tool to determine

  1. what are the package dependencies (internal and external/3rd party) of a specific feature/program/component
  2. how much of the dependent package is utilized by this specific feature/program/component.

How should I use the coverage tool to achieve the above goal? I'm using coverage 7.4.4.

amaranthjinn commented 4 months ago

An update, I think the coverage is measuring only the dependencies invoked by the dummy test. It's just that all the files in the directory seem to be listed, with 0% coverage, so it's a bit confusing to sort through at first glance. I think it can be a bit confusing if I want to find out if there's a dependency that should be called but has 0% coverage, and not because the dependency is not in the chain of calls and therefore 0% coverage? Or did I misunderstand some usage of the tool?

amaranthjinn commented 4 months ago

I'm able to --skip-empty https://coverage.readthedocs.io/en/latest/cmd.html, and that removed 0% coverage files. However, of the files left, I'm not sure if they are triggered as dependencies of my test, seems more than that. @nedbat, I'm hoping dynamic context would help me trace the dependencies, but it still doesn't seem to work for me, see https://stackoverflow.com/questions/78330426/coverages-dynamic-context-usage-returns-no-contexts-were-measured.

stasos24 commented 3 months ago

Hi @amaranthjinn Just comment these lines:

.venv/lib/python3.11/site-packages/coverage/inorout.py-16478- #if self.third_match.match(filename) and not self.source_in_third_match.match(filename): .venv/lib/python3.11/site-packages/coverage/inorout.py:16579: # return "inside --source, but is third-party"

nedbat commented 3 months ago

There are a few things I still don't understand:

You should run coverage so that it measures all third-party packages, and then extract the information you want from the JSON report.

amaranthjinn commented 3 months ago

Sorry for the delayed response, got roped into a different project.

Just to recap, I want to determine

  1. What 3rd party dependencies are used by my project (and % code of the dependency used)?
  2. What legacy 3rd party dependencies are not used by my project (but still sitting in my poetry, requirements config, and taking space)?

And I call coverage in my src directory: coverage run --source=/usr/local/google/home/Workspace/p,/usr/local/google/home/.virtualenvs/p/lib64/python3.11/site-packages -m pytest --debug=trace coverage html --skip-empty --ignore-errors

@stasos24 , thank you. I tried your suggestion, I don't see any obvious difference from using coverage with arguments --source? I see files in the site-packages as well as project src directory being traced, no dynamic context information though.

@nedbat

The coverage tool is still useful for me to answer my two questions, but it seems like I need to figure out ways to use it.

  1. The report right now has a lot of items with 0% coverage, dependencies from project src and 3rd party intermixed, few thousands of files. It's hard for me to extract a pattern out (thus I tried to use dynamic context to group the dependencies by test method, but hasn't succeeded). What are your recommendations on how to organize the results (via command arguments, configurations, or customized code based on API)?
  2. For the dependencies that have 0% or very low coverage, I want to find out what in the project's source code is/should be calling it (or not at all). I guess the coverage tool cannot provide that? Do you have any recommendations on what tool can be leveraged to provide this dependency tracing (up the calling chain)?
gitgithan commented 1 month ago

I have very similar questions as this issue title. I finished reading all docs and struggling hard with setting source correctly in .coveragerc

I'll explain with a minimal example. I believe the answer to this would partially answer amaranthjinn's issue too.

I'm using coverage to learn about how third party libraries like pandas work. I want to know what lines are executed under the hood when I use their high level api. I know i can use a debugger to do the same, but coverage helps in having a static report view highlighting which branch got executed depending on what input. I don't care about coverage % because i'm not developing pandas and there are no test files because the purpose is not to test, but to learn a pip installed library.

Here's cov.py

import pandas as pd
from pandas.core import frame
# from helper import add

data = pd.DataFrame({"A": [1, 1, 3], "B": [4, 5, 6]})
grouped_data = data.groupby("A").sum()
print(grouped_data)
# print(add(1, 2))

I run with python -m coverage run cov.py (see Footnote below)

Here's .coveragerc

[run]
source =
    /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/frame.py
    /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/groupby
debug=trace

[report]
skip_empty = True
omit = 
    **/__init__.py
exclude_lines =
    import

Ideally, I want coverage to automatically only show me files that have their function/method bodies executed. I don't want to see every single file under pandas just because they were imported.

Since i did import pandas as pd, it caused the __init__.py of every submodule in pandas to run, including all function/class definitions, bloating the report when initially i only added pandas to source. Adding those 2 lines in source of coveragerc was a manual attempt to limit report length, which partially worked. It's not ideal because it requires that i already know the paths, which should not be if i'm learning the library.

Problem

I know the DataFrame class is defined in /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/frame.py which i added under source. I expect the report to show def __init__ of class DataFrame(NDFrame, OpsMixin): being run and frame.py appearing in report.

Why do I get

/home/hanqi/.local/lib/python3.10/site-packages/coverage/inorout.py:503: CoverageWarning: Module /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/frame.py was never imported. (module-not-imported)
  self.warn(f"Module {pkg} was never imported.", slug="module-not-imported")

Adding from pandas.core import frame does not fix this warning too. (answer to this is nice to have) The warning makes sense that I never imported frame.py, but I know that the DataFrame class from this module will be used when code does pd.DataFrame(), so it's valuable to see how its def __init__ ran in the report.

Another strange thing is /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/groupby actually worked to limit the report to only modules in groupby folder. This makes me think source only works for folders and not files? I tested by adding my own simple helper.py (commented out in cov.py) and output still says CoverageWarning: Module helper.py was never imported

I suspect the above issue is due to docs saying

Only importable files (ones at the root of the tree, or in directories with a __init__.py file) will be considered

/home/hanqi/.local/lib/python3.10/site-packages/pandas/core/groupby is a directory with __init__.py, so it worked. /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/frame.py is not a file at the root of the tree (I assume this means the pwd where coverage was ran), so it failed?

skip_empty, omit and exclude_lines were my attempts to reduce the number of files shown in report. skip_empty and omit did work to exclude some files but still many files are left in report. I was hoping if every line in a file is excluded by exclude_lines, the whole file would not show up in report too, but that was not the case. I don't want to see files in report, which only have imports being run but their classes/functions were never used.

The ideal

Docs describe skip_empty as Don’t report files that have no executable code (such as __init__.py files). What i want exactly is to hide all files from report where "effectively" no code was executed (yes definitions of class/functions were executed when imported by others but their body was not used), and only show files where at least some part of their class/functions were used.

If this is possible, i'll have

the coverage command is not found even though i expected pip install coverage to make the cli available. I can't add it to PATH since i don't know where it's supposed to be. I'm on WSL2 Ubuntu 20.04, Coverage.py, version 7.5.4 with C extension. pip installed libraries are not installed in any virtual environment.

Others reported similar issue and answer with no explanation why the binary is not available after install or where it should be: https://stackoverflow.com/a/69630406/8621823

Similar request for using coverage as runtime inspection tool: https://stackoverflow.com/q/37979365/8621823