Open bryevdv opened 2 years ago
Any thoughts, corrections, or additions to the requirements or summary above is appreciated!
Presumably the correct tool to use is gcov. Unknown (to me) are what build-time adjustments will be necessary to make use of it. Presumably at least a link to -lgcov but possibly more. C++ experts will have to weigh in on this.
Combining gcov results ("GCDA files") from multiple runs seems to be possible based on various Stack Overflow answers
@lightsighter did some code coverage investigations not long ago for Legion, so he should have a better idea of the challenges. I can imagine it's not trivial to record and combine code coverage information from multiple ranks, and multiple threads within those ranks.
It seems plausible that coverage reporting could affect performance measurement. Should coverage reporting be integrated into the matrix of "standard" test jobs unconditionally, with dedicated jobs for performance measurement? Or the converse?
There are no performance measurements happening at the moment in CI, so AFAIK we could conceivably incorporate code coverage measurements into regular CI runs. However, IMHO we should first get an idea of the overheads; if the slowdown is too large we may choose to run code coverage less often.
Of note, legate.core has no test suite of its own, so its coverage metrics will have to come from runs on libraries that exercise it (i.e. cunumeric at the moment).
I can imagine it's not trivial to record and combine code coverage information from multiple ranks, and multiple threads within those ranks.
This is the biggest unknown in my mind. But, just to make sure my understanding is correct: if Python cunumeric
code is executed on a cluster, do both Python and execute on all the nodes? Or is the Python more like a "head node" with only C++ level working across the cluster? I think I had assumed the latter, but now I am not so sure.
Of note, legate.core has no test suite of its own, so its coverage metrics will have to come from runs on libraries that exercise it (i.e. cunumeric at the moment).
This was one of the prompting questions. Python coverage
can collect for any modules specified in the configuration. I am not so sure about how gcov
might work in this situation.
if Python cunumeric code is executed on a cluster, do both Python and [C++] execute on all the nodes? Or is the Python more like a "head node" with only C++ level working across the cluster?
The python work will be replicated across all nodes (i.e. all the nodes will execute the same python script in full), and they will all "emit" the full amount of C++ work, but each node will execute a disjoint subset of the C++ work (each node will "mask-out" C++ work that corresponds to other nodes).
For example, for this code:
A = ones((20,))
A *= 3
assuming we have two nodes, and array A
gets split into two pieces, all nodes will "emit" both sub-operations A[0:10] *= 3
and A[10:20] *= 3
, but only node 0 will execute A[0:10] *= 3
.
(Note that I've taken significant liberties with this example.)
1) I don't follow what you mean when you say the "combined" codebase of cunumeric and legate.core
Especially if this is to be part of the CI, these projects are maintained as separate repos. Whether or not legate-core is collecting data is irrelevant to whether or not cunumeric is collecting data. Additionally, I would be hesitant in conflating the two repos more than they already are.
2) Collecting C++ data
If we're limited to just free tools, then gcov may work, however I would be cautious of future support for windows and Mac. gcov is tied to gcc. clang/llvm has their own version, and that should allow us to work across multiple platforms, but that would require commitment to support compilation via llvm.
3) Combining coverage data
Yes. It is fiddly, but doable. gcov has gcov-tool merge, llvm-cov can export to lcov and then merge the lcov files. In theory the data files provided by llvm might be intelligible by gcov, but in practice gcov/llvm-cov tend to be strongly tied to the compilers they ship with, so it would be fragile. Complicating this is that the .gcda and .gcno files tend to need to be from the same build instance in order to be compatible with each other.
I've heard of codecov before, but I've never had the opportunity to use it.
4) Performance vs Coverage testing
This depends on if we want to have coverage based checks on PR's. Any coverage instrumentation will affect performance, and performance testing should be a standalone job. If the goal is to have coverage based checks (i.e. PR's don't get merged if they significantly drop the coverage numbers), then these should be run as a standalone check. If they're meant to be more informational, then we might consider stepping this back to something like a scheduled job.
Especially if this is to be part of the CI, these projects are maintained as separate repos. Whether or not legate-core is collecting data is irrelevant to whether or not cunumeric is collecting data. Additionally, I would be hesitant in conflating the two repos more than they already are.
Sorry I may not have expressed things clearly. I don't propose anything to combine or entangle the repos. I only meant that it should be possible to collect coverage for both cunumeric
and legate.core
at the same time as running the cunumeric
tests (just as Python can do when asked). Or as @manopapad put it
Of note, legate.core has no test suite of its own, so its coverage metrics will have to come from runs on libraries that exercise it (i.e. cunumeric at the moment).
Sorry I may not have expressed things clearly. I don't propose anything to combine or entangle the repos. I only meant that it should be possible to collect coverage for both cunumeric and legate.core at the same time as running the cunumeric tests (just as Python can do when asked). Or as @manopapad put it
Of note, legate.core has no test suite of its own, so its coverage metrics will have to come from runs on libraries that exercise it (i.e. cunumeric at the moment).
That sounds like a good reason to get tests for legate.core. Until we get to that point, a scheduled job would probably be the way to go? Otherwise we'd need some way for legate PR's to pull up-to-date bits from cunumeric or we'd need to set up a cross-repo build (or freeze a target version of cunumeric to use for legate testing).
cc @magnatelee @marcinz @manopapad @m3vaz
There is an interest in providing code coverage reporting, and enforcing minimum code coverage levels. This can be a somewhat involved topic even for smaller and simpler projects. Our situation is substantially complicated by:
cunumeric
andlegate.core
) environmentlegate
PythonThis issue is to attempt to get alignment on and understanding of top-level requirements, and secondarily, to connect those requirements to current "known unknowns" or risks. Once there is agreement, things can begin to be broken down into some granular concrete tasks.
Requirements
Collect line and branch coverage data for Python code
cunumeric
andlegate.core
codebases (combined)Collect line and branch coverage data for C++ code
cunumeric
andlegate.core
codebases (combined)Combine coverage data for Python and C++ code
GitHub integration
Local dev usage
Current capability
With a minimal
.coveragerc
configuration:Pytest can generate a line and branch coverage report including
cunumeric
andlegate.core
:via an invocation similar to
This kind of report (possibly with the addition of missing line numbers) is useful and suitable for local dev use.
Additionally, HTML or XML reports may be generated, for example this invocation will generate an HTML report, including links to annotated source code
Speculation: It would not be too difficult, today, to add a CI Job that reports on a single configuration and saves HTML results as an artifact.
Unknowns
Combining Python data
The coverage project has documentation regarding combining coverage data from multiple runs:
https://coverage.readthedocs.io/en/6.3.2/cmd.html#cmd-combine
This seems fairly straightforward at a glance but some experimentation will be needed to gain practical experience. However, it seems that combining Python data is at least possible in principle.
Collecting and combining C++ data
Presumably the correct tool to use is
gcov
. Unknown (to me) are what build-time adjustments will be necessary to make use of it. Presumably at least a link to-lgcov
but possibly more. C++ experts will have to weigh in on this.Combining
gcov
results ("GCDA files") from multiple runs seems to be possible based on various Stack Overflow answersIt seems at least a little convoluted. It's possible a service (see Codecov, below) can handle this. If not, we will need to gain some practical experience to see how best to apply these techniques.
Performance vs Coverage testing
It seems plausible that coverage reporting could affect performance measurement. Should coverage reporting be integrated into the matrix of "standard" test jobs unconditionally, with dedicated jobs for performance measurement? Or the converse?
Report consolidation
While it seems possible that Python and C++ data can be separately combined, it is a completely open question whether or not those separate Python and C++ reports can be combined together in any way.
For any of the combined reports that may be possible, some amount of post-processing will be required. Will this happen on a GH action or elsewhere? It's possible a GH integration could rake on this burden (see Codecov, below).
GitHub integration
The most popular tool for coverage reporting on GtiHub is Codecov Although it is a commercial product with paid plans, they offer their service free to OSS projects:
Ostensibly, Codecov immediately supports:
All that said, while things look promising, some experimentation will be required to see how well this works for our situation. There are several unknowns, for instance: