nv-legate / cupynumeric

An Aspiring Drop-In Replacement for NumPy at Scale
https://docs.nvidia.com/cupynumeric
Apache License 2.0
681 stars 73 forks source link

Discussion: Consolidated code coverage reporting #328

Open bryevdv opened 2 years ago

bryevdv commented 2 years ago

cc @magnatelee @marcinz @manopapad @m3vaz

There is an interest in providing code coverage reporting, and enforcing minimum code coverage levels. This can be a somewhat involved topic even for smaller and simpler projects. Our situation is substantially complicated by:

This issue is to attempt to get alignment on and understanding of top-level requirements, and secondarily, to connect those requirements to current "known unknowns" or risks. Once there is agreement, things can begin to be broken down into some granular concrete tasks.

Requirements

Collect line and branch coverage data for Python code

Collect line and branch coverage data for C++ code

Combine coverage data for Python and C++ code

GitHub integration

Local dev usage

Current capability

With a minimal .coveragerc configuration:

# .coveragerc to control coverage.py
[run]
branch = True
source = cunumeric,legate

Pytest can generate a line and branch coverage report including cunumeric and legate.core:

``` ---------- coverage: platform linux, python 3.7.12-final-0 ----------- Name Stmts Miss Branch BrPart Cover ----------------------------------------------------------------------------------------------------------------------------------------------- /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/__init__.py 3 0 0 0 100% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/__init__.py 18 5 4 1 64% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/__init__.py 13 0 0 0 100% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/env.py 6 0 0 0 100% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/future.py 84 28 24 7 64% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/geometry.py 165 37 76 9 76% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/operation.py 221 89 90 8 55% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/partition.py 87 15 38 13 76% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/partition_functor.py 80 43 28 2 53% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/pending.py 4 0 0 0 100% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/region.py 165 66 74 17 54% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/space.py 155 80 86 14 41% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/task.py 243 114 122 21 48% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/transform.py 99 36 38 5 60% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/_legion/util.py 236 109 86 9 50% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/communicator.py 57 29 14 1 46% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/constraints.py 123 28 36 6 79% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/context.py 104 23 10 3 77% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/corelib.py 33 3 4 0 86% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/install_info.py 3 0 0 0 100% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/io.py 62 4 18 2 92% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/launcher.py 537 73 166 27 84% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/legate.py 189 117 92 9 32% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/operation.py 494 133 176 23 71% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/partition.py 206 67 42 2 67% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/projection.py 69 21 33 4 62% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/resource.py 23 3 8 2 84% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py 704 189 233 42 69% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/shape.py 140 19 52 6 86% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/solver.py 284 83 148 17 69% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/store.py 615 76 195 46 84% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/transform.py 447 131 94 10 68% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/types.py 117 10 30 6 89% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/utils.py 26 1 10 0 97% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/timing/__init__.py 2 2 0 0 0% /home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/timing/timing.py 84 84 20 0 0% cunumeric/__init__.py 11 0 0 0 100% cunumeric/_ufunc/__init__.py 6 0 0 0 100% cunumeric/_ufunc/bit_twiddling.py 9 0 0 0 100% cunumeric/_ufunc/comparison.py 16 0 0 0 100% cunumeric/_ufunc/floating.py 13 0 0 0 100% cunumeric/_ufunc/math.py 35 0 4 0 100% cunumeric/_ufunc/trigonometric.py 20 0 0 0 100% cunumeric/_ufunc/ufunc.py 205 40 125 23 77% cunumeric/array.py 933 231 424 94 73% cunumeric/config.py 315 21 58 5 92% cunumeric/coverage.py 88 15 28 2 85% cunumeric/deferred.py 1066 62 407 38 92% cunumeric/eager.py 415 222 238 37 42% cunumeric/fft/__init__.py 6 0 0 0 100% cunumeric/fft/fft.py 112 2 40 2 97% cunumeric/install_info.py 2 0 0 0 100% cunumeric/linalg/__init__.py 6 0 0 0 100% cunumeric/linalg/cholesky.py 89 9 18 2 86% cunumeric/linalg/linalg.py 46 14 24 6 66% cunumeric/module.py 786 112 411 70 82% cunumeric/patch.py 4 4 0 0 0% cunumeric/random/__init__.py 6 0 0 0 100% cunumeric/random/random.py 45 8 22 8 76% cunumeric/runtime.py 285 102 114 15 59% cunumeric/sort.py 53 2 20 1 93% cunumeric/sphinxext/__init__.py 0 0 0 0 100% cunumeric/sphinxext/_comparison_config.py 39 39 2 0 0% cunumeric/sphinxext/_comparison_util.py 62 62 34 0 0% cunumeric/sphinxext/_cunumeric_directive.py 14 14 4 0 0% cunumeric/sphinxext/_templates.py 6 6 0 0 0% cunumeric/sphinxext/comparison_table.py 23 23 8 0 0% cunumeric/sphinxext/ufunc_formatter.py 10 10 2 0 0% cunumeric/thunk.py 99 40 6 0 62% cunumeric/utils.py 120 21 63 3 78% cunumeric/window.py 22 0 4 0 100% ----------------------------------------------------------------------------------------------------------------------------------------------- TOTAL 10865 2777 4103 618 71% ================================== 2888 passed, 4 skipped, 35 xfailed, 957 xpassed, 95 warnings in 77.86s (0:01:17) ================================== ```

via an invocation similar to

legate -c "import pytest; pytest.main('tests/integration --cov'.split())" --gpus 2 -cunumeric:test

This kind of report (possibly with the addition of missing line numbers) is useful and suitable for local dev use.

Additionally, HTML or XML reports may be generated, for example this invocation will generate an HTML report, including links to annotated source code

legate -c "import pytest; pytest.main('tests/integration --cov --cov-report html:cov_html'.split())" --gpus 2 -cunumeric:test

Speculation: It would not be too difficult, today, to add a CI Job that reports on a single configuration and saves HTML results as an artifact.

Unknowns

Combining Python data

The coverage project has documentation regarding combining coverage data from multiple runs:

https://coverage.readthedocs.io/en/6.3.2/cmd.html#cmd-combine

This seems fairly straightforward at a glance but some experimentation will be needed to gain practical experience. However, it seems that combining Python data is at least possible in principle.

Collecting and combining C++ data

Presumably the correct tool to use is gcov. Unknown (to me) are what build-time adjustments will be necessary to make use of it. Presumably at least a link to -lgcov but possibly more. C++ experts will have to weigh in on this.

Combining gcov results ("GCDA files") from multiple runs seems to be possible based on various Stack Overflow answers

It seems at least a little convoluted. It's possible a service (see Codecov, below) can handle this. If not, we will need to gain some practical experience to see how best to apply these techniques.

Performance vs Coverage testing

It seems plausible that coverage reporting could affect performance measurement. Should coverage reporting be integrated into the matrix of "standard" test jobs unconditionally, with dedicated jobs for performance measurement? Or the converse?

Report consolidation

While it seems possible that Python and C++ data can be separately combined, it is a completely open question whether or not those separate Python and C++ reports can be combined together in any way.

For any of the combined reports that may be possible, some amount of post-processing will be required. Will this happen on a GH action or elsewhere? It's possible a GH integration could rake on this burden (see Codecov, below).

GitHub integration

The most popular tool for coverage reporting on GtiHub is Codecov Although it is a commercial product with paid plans, they offer their service free to OSS projects:

Codecov will always be free for open source projects, educators, and students.

Ostensibly, Codecov immediately supports:

All that said, while things look promising, some experimentation will be required to see how well this works for our situation. There are several unknowns, for instance:

bryevdv commented 2 years ago

Any thoughts, corrections, or additions to the requirements or summary above is appreciated!

manopapad commented 2 years ago

Presumably the correct tool to use is gcov. Unknown (to me) are what build-time adjustments will be necessary to make use of it. Presumably at least a link to -lgcov but possibly more. C++ experts will have to weigh in on this.

Combining gcov results ("GCDA files") from multiple runs seems to be possible based on various Stack Overflow answers

@lightsighter did some code coverage investigations not long ago for Legion, so he should have a better idea of the challenges. I can imagine it's not trivial to record and combine code coverage information from multiple ranks, and multiple threads within those ranks.

It seems plausible that coverage reporting could affect performance measurement. Should coverage reporting be integrated into the matrix of "standard" test jobs unconditionally, with dedicated jobs for performance measurement? Or the converse?

There are no performance measurements happening at the moment in CI, so AFAIK we could conceivably incorporate code coverage measurements into regular CI runs. However, IMHO we should first get an idea of the overheads; if the slowdown is too large we may choose to run code coverage less often.

Of note, legate.core has no test suite of its own, so its coverage metrics will have to come from runs on libraries that exercise it (i.e. cunumeric at the moment).

bryevdv commented 2 years ago

I can imagine it's not trivial to record and combine code coverage information from multiple ranks, and multiple threads within those ranks.

This is the biggest unknown in my mind. But, just to make sure my understanding is correct: if Python cunumeric code is executed on a cluster, do both Python and execute on all the nodes? Or is the Python more like a "head node" with only C++ level working across the cluster? I think I had assumed the latter, but now I am not so sure.

Of note, legate.core has no test suite of its own, so its coverage metrics will have to come from runs on libraries that exercise it (i.e. cunumeric at the moment).

This was one of the prompting questions. Python coverage can collect for any modules specified in the configuration. I am not so sure about how gcov might work in this situation.

manopapad commented 2 years ago

if Python cunumeric code is executed on a cluster, do both Python and [C++] execute on all the nodes? Or is the Python more like a "head node" with only C++ level working across the cluster?

The python work will be replicated across all nodes (i.e. all the nodes will execute the same python script in full), and they will all "emit" the full amount of C++ work, but each node will execute a disjoint subset of the C++ work (each node will "mask-out" C++ work that corresponds to other nodes).

For example, for this code:

A = ones((20,))
A *= 3

assuming we have two nodes, and array A gets split into two pieces, all nodes will "emit" both sub-operations A[0:10] *= 3 and A[10:20] *= 3, but only node 0 will execute A[0:10] *= 3.

(Note that I've taken significant liberties with this example.)

m3vaz commented 2 years ago

1) I don't follow what you mean when you say the "combined" codebase of cunumeric and legate.core

Especially if this is to be part of the CI, these projects are maintained as separate repos. Whether or not legate-core is collecting data is irrelevant to whether or not cunumeric is collecting data. Additionally, I would be hesitant in conflating the two repos more than they already are.

2) Collecting C++ data

If we're limited to just free tools, then gcov may work, however I would be cautious of future support for windows and Mac. gcov is tied to gcc. clang/llvm has their own version, and that should allow us to work across multiple platforms, but that would require commitment to support compilation via llvm.

3) Combining coverage data

Yes. It is fiddly, but doable. gcov has gcov-tool merge, llvm-cov can export to lcov and then merge the lcov files. In theory the data files provided by llvm might be intelligible by gcov, but in practice gcov/llvm-cov tend to be strongly tied to the compilers they ship with, so it would be fragile. Complicating this is that the .gcda and .gcno files tend to need to be from the same build instance in order to be compatible with each other.

I've heard of codecov before, but I've never had the opportunity to use it.

4) Performance vs Coverage testing

This depends on if we want to have coverage based checks on PR's. Any coverage instrumentation will affect performance, and performance testing should be a standalone job. If the goal is to have coverage based checks (i.e. PR's don't get merged if they significantly drop the coverage numbers), then these should be run as a standalone check. If they're meant to be more informational, then we might consider stepping this back to something like a scheduled job.

bryevdv commented 2 years ago

Especially if this is to be part of the CI, these projects are maintained as separate repos. Whether or not legate-core is collecting data is irrelevant to whether or not cunumeric is collecting data. Additionally, I would be hesitant in conflating the two repos more than they already are.

Sorry I may not have expressed things clearly. I don't propose anything to combine or entangle the repos. I only meant that it should be possible to collect coverage for both cunumeric and legate.core at the same time as running the cunumeric tests (just as Python can do when asked). Or as @manopapad put it

Of note, legate.core has no test suite of its own, so its coverage metrics will have to come from runs on libraries that exercise it (i.e. cunumeric at the moment).

m3vaz commented 2 years ago

Sorry I may not have expressed things clearly. I don't propose anything to combine or entangle the repos. I only meant that it should be possible to collect coverage for both cunumeric and legate.core at the same time as running the cunumeric tests (just as Python can do when asked). Or as @manopapad put it

Of note, legate.core has no test suite of its own, so its coverage metrics will have to come from runs on libraries that exercise it (i.e. cunumeric at the moment).

That sounds like a good reason to get tests for legate.core. Until we get to that point, a scheduled job would probably be the way to go? Otherwise we'd need some way for legate PR's to pull up-to-date bits from cunumeric or we'd need to set up a cross-repo build (or freeze a target version of cunumeric to use for legate testing).