Lambdaris commented 4 months ago

Let's consider such code:

constexpr bool constant_bool() { return false; }

int mcdc_with_constant(bool a) {
  if (a && constant_bool()) {
    return 1;
  } else {
    return 0;
  }
}

Because the decision cannot be evaluated to true, no condition could be shown to be covered by MCDC.

The constant may be a template parameter or a platform-dependent variable. As a result, one of the two branches might be optimized out and there is no branch in practice.

So could we eliminate such conditions and decisions from mcdc? For example,

For a && CONST, if CONST==false, condition a is treated as folded too.
For a || CONST, if CONST==true, a will be taken as folded. (Just in mcdc, normal branch coverage could still count it)
For a && (b || CONST), if CONST==true, b is taken as folded.

Also if a decision can not be evaluated to the other value, we mark it as folded.

We can check value of CONST in mcdc as long as the front end preserves the non-zero counter. For now clang assign Zero to both Counter and FalseCounter if the condition was constant. By only setting the never reached counter to Zero, we can ensure the constant is true if its FalseCounter is Zero and vice versa. Then by analyzing the decision tree, conditions that would be never covered in meaning of mcdc can be found out and marked as folded too.

Lambdaris commented 4 months ago

As discussion on rust, @RenjiSann shows the expected output of llvm:

  |---> MC/DC Decision Region (LL:CC) to (LL:CC)
  |
  |  Number of Conditions: 2
  |     Condition C1 --> (LL:CC)
  |     Condition C2 --> (LL:CC)
  |
  |  Executed MC/DC Test Vectors:
  |
  |     C1, C2    Result
  |  1 { F,  C  = F      }
  |  2 { T,  C  = F      }
  |
  |  C1-Pair: Skipped: uncoverable (folded by C2)
  |  C2-Pair: Folded
  |  MC/DC Coverage for Decision: 100%
  |
  ------------------

I have a premature thought as this comment suggests:

For now clang set both Counter and FalseCounter to Zero if the condition is folded. We can only set the counter of folded branch to Zero (like rustc does now).

On llvm-cov side, mark a condition as folded if either of Counter and FalseCounter is Zero.

Pick up all practical test vectors considering folded conditions, and analyze which conditions are unrecoverble or folded.

This method does not invade instruments but only effects how llvm presents for users.

For instance, suppose we have a decision a && (b || CONSTANT_TRUE || c).

With the scenario, we still have 4 mcdc branch mappings: a, b, CONSTANT_TRUE, c, in which clang and rustc mark FalseCounter of CONSTANT_TRUE as Zero. Then llvm-cov knows CONSTANT_TRUE is folded and always evaluated to true because it knows the FalseCounter is Zero.

After llvm-cov collects all test vectors, it can show practical test vectors by pick up whose folded conditions bits are expected. Say, in our example, only test vectors in this table are practical: (D means DontCare in TestVector) a b CONSTANT_TRUE c Decision

0 D D D 0

1 1 D D 1

1 0 1 D 1

As we see, the practical test vectors are whose bit for CONSTANT_TRUE is 1 or DontCare.

Next let's find which conditions are uncoverable or folded:

Partition all practical test vectors by value of a. We show there are different decision values for the two groups: {(0,D,D,D)} and {(1,1,D,D), (1,0,1,D)}, thus a is coverable and not folded.

Partition all practical test vectors by value of b (exclude DontCare), both the two group, {(1,1,D,D)} and {(1,0,1,D)} share same decision value. Thus we know b is uncoverable due to the constant because it can not change value of the decision.

CONSTANT_TRUE is marked as folded just as llvm-cov does now.

Partition all pracitcal test vectors by value of c ...... No, c is DontCare in all practical test vectors, so it's Folded by constant.

Briefly the scheme is:

Only set counter of the folded branch to Zero if the condition is folded in front end compilers.

In llvm-cov, mark a condition as folded if either of Counter or FalseCounter was Zero.

Pick up all "practical" test vectors, whose bit of folded condition is the practical value or DontCare.

For each condition, partition practical test vectors by its value to three group: true, false or DontCare

If either of true group or false group is empty, the condition is folded.

If exist tv1 in true group and tv2 in false group, tv1 and tv2 have different decision value, the condition is normal.

If all test vectors in true group and false group have same decision value, the condition is uncoverable.

After llvm-cov collects all test vectors, it can show practical test vectors by pick up whose folded conditions bits are expected. Say, in our example, only test vectors in this table are practical: (`D` means `DontCare` in `TestVector`)	a	b	CONSTANT_TRUE	c	Decision
0	D	D	D	0
1	1	D	D	1
1	0	1	D	1

RenjiSann commented 4 months ago

I am not sure I understand this correctly.

Pick up all "practical" test vectors, whose bit of folded condition is the practical value or DontCare. I don't understand your definition of practical test vector. What would be an example of a non-practical test vector ? IIUC, the bit of the folded condition can't be something else than DontCare or the constant value, so isn't it all the test vectors ?

If all test vectors in true group and false group have same decision value, the condition is uncoverable. Here, how can we know that the condition is uncoverable and not just uncovered ?

From what I understand, with the following example, a would be detected as uncoverable :

void foo(bool a, bool b) {
    if (a || (b && true)) {
        print("success");
    }
}

int main() {
    foo(true, true);
    foo(false, true);
}

Here, if I look at a, true group is {(1, 1, 1)} and false group is {(0, 1, 1)}. Both decisions are True, so does it mean a would be flagged as uncoverable, even though (false, false, true) and (true, false, true) would cover the condition ?

Lambdaris commented 4 months ago

Here, if I look at a, true group is {(1, 1, 1)} and false group is {(0, 1, 1)}. Both decisions are True, so does it mean a would be flagged as uncoverable, even though (false, false, true) and (true, false, true) would cover the condition ?

a is coverable. Let's write all possible test vectors: (closer to test vectors llvm reporting, not exactly TestVector in llvm code)

`a`	`b`	`true`	decision
1	D	D	1
0	1	1	1
0	1	0	0
0	0	D	0

Still D means this condition is short-circuited (marked as DontCare in code), shown as - by llvm-cov.

Practical test vectors are whose bit of true is 1 or D: (1,D,D), (0,1,1), (0,0,D). So for a, true group is {(1,D,D)}, false group is {(0,1,1),(0,0,D)}, where (0,0,D)->false and (1,D,D)->true can show a is covered.

RenjiSann commented 4 months ago

I see, so practical test vectors is the subset of all "actually possible" test vectors, which excludes hypothetical test vectors with a False as the evaluation of a constant folded to True and vice-versa.

Still, I'm a bit confused. Doesn't this approach need us to eagerly generate all test vectors to check if MCDC is achievable for a given condition ?

I feel like it would be easier to use the Binary decision diagram representation to propagate the folding to conditions and deduce which one are indeed folded.

Lambdaris commented 4 months ago

Still, I'm a bit confused. Doesn't this approach need us to eagerly generate all test vectors to check if MCDC is achievable for a given condition ?

I feel like it would be easier to use the Binary decision diagram representation to propagate the folding to conditions and deduce which one are indeed folded.

LLVM has already generated all test vectors by looking up the binary decision diagram, see buildTestVector.

Indeed folded conditions can be easily detected upon traversing the decision diagram. Nevertheless I feel a bit troublesome on detecting uncoverable conditions, so I went over to test vectors. But now thanks to your remind, I think we could find folded conditions as building test vectors and distinguish uncoverable conditions later.

Edit: It seems that the most cost is brought by detecting uncoverable conditions, so maybe it's still better to detect folded conditions with test vectors for clean (thus we can do them all in one function)? Or is there any good algorithm to find uncoverable conditions?

Lambdaris commented 4 months ago

Ok I have found it can be cleaner if we just partition practical test vectors by value of the decision. Then we get true group and false group, in which all test vectors are evaluated to true and false respectively in meaning of decision. Hence,

If all test vectors of exact one group show a condition is DontCare, the condition is uncoverable
If all test vectors of both two groups show a condition is DontCare, the condition is unreachable

Considering all constants have been marked as folded in advance, the rest conditions are normal.

Lambdaris commented 4 months ago

94137 is drafted to solve this issue.

evodius96 commented 4 months ago

Right, a folded constant condition does impact whether other conditions are reachable or coverable. However, I am of two minds about whether they ought to be excluded from MC/DC, although I could be persuaded.

What do other coverage tool vendors, like LDRA or vectorcast, do for these cases?

When I originally implemented MC/DC, my (perhaps erroneous) conclusion was that, while it didn't make sense to include actual constant conditions in MC/DC, an uncoverable or unreachable condition should be included and a lack of coverage would indicate a case where perhaps the code should be refactored to be testable, or in the case of templates, a more complete test written that provided different constants in different instances.

It seems to me this is where MC/DC is useful since, like branch coverage and other coverage criteria, it's ultimately a measure of your source code coverage.

Lambdaris commented 4 months ago

an uncoverable or unreachable condition should be included and a lack of coverage would indicate a case where perhaps the code should be refactored to be testable, or in the case of templates, a more complete test written that provided different constants in different instances.

That's a quite convincing argument. Upon implementing this feature I'm also indecisive about how we should treat these results. Though it seems to be responsibility of some linters to warn users (not sure about clang, but rust linters can easily do such stuff), coverage itself could do some thing too.

I thought at least unreachable conditions are something coder should avoid. However we can not distinguish constants from unreachable conditions only by counters (Rust marks all unreachable counters as Zero regardless of whether they are constants), so I also excluded them.

Could we add an additional index like "Uncoverable" after Miss to indicate how many conditions in Miss are uncoverable?

evodius96 commented 4 months ago

Thanks!

I would be concerned that an additional index or category would be missed by those (mainly in embedded) who just want to see a code coverage metric for MC/DC at the end of the day, though maybe it's ok. I would be ok with adding an option to control whether uncoverable conditions are included or excluded, but I'd like the default setting to correspond to what other vendors have done, if possible. My recollection is that uncoverable/unreachable conditions are included by default with tools like LDRA, but I would need to do some additional research to be sure. Not saying that's a hard requirement, but I want to keep expectations aligned with what users may already be used to.

I'm traveling this week but can look into this more hopefully soon.

Thanks for looking into this!

evodius96 commented 4 months ago

Sorry for the delay here -- I plan to run some experiments next week that hopefully will inform the decision. Thanks!

Lambdaris commented 3 months ago

I've added an option --mcdc-exclude to control which kinds of conditions can be excluded from coverage. By default it's constant. Users can pass arguments --mcdc-exclude=uncoverable,constant,unreachable to exclude all or --mcdc-exclude=none to include them all.

llvm / llvm-project

[Coverage][MC/DC] Constant folding might make conditions never covered #93560

94137 is drafted to solve this issue.