intel / p3-analysis-library

A library simplifying the collection and interpretation of P3 data.
https://intel.github.io/p3-analysis-library/
MIT License
7 stars 10 forks source link

Update coverage schema to 0.3.0 #31

Closed Pennycook closed 4 months ago

Pennycook commented 5 months ago

Experiments with the old schemas (0.1.0 and 0.2.0) have uncovered two issues:

1) A file hash is not sufficient to identify a file when computing divergence, because a code base may contain duplicates of a file that SHOULD count towards divergence. Accounting for this requires the coverage format to store both the path and a hash.

2) The region format is too closely tied to the operation of specific tools. Each region in the output corresponds to a region of source identified by the producer of the JSON, with no way to recover what the region represents, or which lines within the region are used. As a result, it is impossible touse coverage information from multiple tools (or tool versions).

The new coverage format attempts to address both of these issues by requiring both a file name and some sort of unique ID (in practice, a hash), and storing an explicit list of line numbers.

Related issues

This is tied to https://github.com/intel/code-base-investigator/issues/72. When we designed the original coverage format, we assumed that CBI's handling of duplicate files was correct and that it would be sufficient to base coverage calculations on files identified only by a hash value.

Proposed changes

I haven't yet rewritten the documentation, because I want to make sure we agree on the functionality first.