Measure / report the callstack distance from coverage-context for each LoC

elemakil commented 3 years ago

Is your feature request related to a problem? Please describe.

At $DAYJOB we use code coverage to help us improve the code quality of both new and existing (sometimes poorly tested) codebases. The addition of the measurement contexts feature has been extremely useful in helping us better understand which code is well tested and which isn't. At allows is to make sense of the plain numeric coverage value.

However, one recurring problem not solved by the measurement contexts is that there's no straightforward method to determine the "stack distance" between a test and a unit of code. What I mean by "stack distance" is essentially the number of stack frames between the point of origin (e.g. the (unit)test) and the source line in question. For each line of source code this could be recorded for each coverage context separately.

Among other things, this would allow to evaluated whether a certain unit of code is really covered by a test explicitly or just implicitly by being used as a dependency of some other unit of code.

Describe the solution you'd like

I'll illustrate this with a simple example. Given the following code with its test:

def prepare():
    return 42

def squared(value):
    return value ** 2

def do_actual_work(data):
    a = squared(data)
    b = data / 2
    return a + b

def run():
    data = prepare()
    return do_actual_work(data)

def test_run():
    assert run() == 1785.0

A coverage analysis reports 100 % coverage from the unit test execution of a single test / context: "test_run". However, of the four involved methods (run, do_actual_work, squared, prepare) only run is tested explicitly. The feature I am proposing would designate

each line of run to have a "callstack distance" of one,
each line of prepare and do_actual_work to have a "callstack distance" of two,
each line of squared to have a "callstack distance" of three.

This information would then for example allow to aide in identifying code that needs more / better testing. The total code coverage could also be evaluated for different cutoff values of the "callstack distance", e.g. for a cutoff >= 3, the coverage is 100%, for cutoff = 2 it is 9 / 11 = 81 % for cutoff = 1 it is 3 / 11 = 27 %.

Describe alternatives you've considered I have not really considered any alternative(s) other than asking developers to identify poorly tested code manually e.g. during code reviews.

However, I am open to investigating other options if any suggestions are raised. :smiley:

Additional context

I expect that this would result in a slow down of the execution of the tracer. Therefore this is probably something that should be opt-in.
I might be able to assist in developing this feature. I had a quick look at the tracer source code and although I have not coded in C in a while it was not complete nonsense to me :wink:

nedbat commented 3 years ago

I see what you are getting at. I haven't thought about what it would take to collect that data, but are you sure it would map as nicely as you want? It's not clear to me that you would always have call-distance==1 tests for all your code. Test helpers would get in the way, or the layers of your own product code. How would you assess the results?

sondrelg commented 3 years ago

I have an almost identical problem, where I would like to map all downstream calls (with callback distances), made from an individual test.

If I were to use the example code given above, this is the minimal amount of data I would need to collect:

{
    # test name
    'test_run': {
        # called function: callback distance
        'run': 1,
        'prepare': 2,
        'do_actual_work': 2,
        'squared': 3,
    }
}

I don't want to change the focus of the issue in any way, but I suppose this is relevant as a first step of figuring out the feasibility of this feature:

Have either of you got any ideas for how one might go about collecting this data in the first place, ideally statically?

Wanted to quickly also say that this sounds like a cool feature 👏

nedbat / coveragepy

Measure / report the callstack distance from coverage-context for each LoC #1133