hcoles / pitest

State of the art mutation testing system for the JVM
Apache License 2.0
1.69k stars 357 forks source link

Fewer mutations generated with version 1.15.4 #1291

Open Stephan202 opened 9 months ago

Stephan202 commented 9 months ago

While reviewing PicnicSupermarket/error-prone-support#984, I noticed that version 1.15.4 of Pitest reports fewer mutations than version 1.15.3. On cursory inspection it appears that version 1.15.4 no longer mutates deferred code that is referenced by static initialization expressions, such as:

The following reproduction steps show the issue (the above are just some examples; quite a lot of other classes are impacted as well):

# Clone the repo.
git clone git@github.com:PicnicSupermarket/error-prone-support.git
cd error-prone-support

# Collect mutation coverage before the upgrade.
./run-mutation-tests.sh
mv error-prone-contrib/target/pit-reports /tmp/pit-reports-before

# Collect mutation coverage after the upgrade.
git checkout origin/renovate/pitest-maven-plugin-1.x
./run-mutation-tests.sh
mv error-prone-contrib/target/pit-reports /tmp/pit-reports-after

# Compare the reports.
firefox /tmp/pit-reports-before/index.html
firefox /tmp/pit-reports-after/index.html

An example of a dramatic differences for the class tech.picnic.errorprone.bugpatterns.CanonicalAnnotationSyntax:

hcoles commented 9 months ago

Thanks for the report.

I've just taken a quick look and I suspect the issue is that the release notes missed out the inclusion of #1274

This change expands pitest's search for code that is only called from static initializers. Although we'd ideally like to mutate this, because the code is only executed once within a JVM, it is only possible to kill these mutants if pitest happens to have launched a fresh JVM just before the mutant is inserted.

From a very quick scan of your example class, this looks to be working as intended since dropRedundantParentheses etc are called only when initializing a static field.

Stephan202 commented 9 months ago

@hcoles thanks for the quick response!

Perhaps I misunderstand, but since execution of the impacted code is deferred, I would expect it to be excluded from the search. After all, the version 1.15.3 report shows that the mutants could indeed be killed prior to this change (without restarting the JVM).

Put another way, given a method foo that is referenced only by a static field, I would would expect it to be excluded when used as

private static final String CONST = foo();

but not when used as

private static final Supplier<String> CONST = () -> foo();
private static final Supplier<String> CONST_2 = ThisClass::foo;

Because in the latter cases any mutation of foo will impact use of the static Suppliers.

hcoles commented 9 months ago

The mutants are not completely unkillable, but they are not reliably killable.

If they happen to be the first mutant in one of the JVMs that pitest launches, they will be killed by an effective test. Unfortunately, they may also cause other mutants to be falsely killed as any side effects from their execution (e.g bad state stored in a static variable) will endure in the JVM from that point on.

However, I think you are correct that there is an issue here. The change has not considered delayed execution from Suppliers etc.

I'll most likely re-relase 1.15.5 tomorrow with the change rolled back until this is reexamined.

Thanks again for the report.

hcoles commented 2 months ago

@Stephan202 Pitest 1.16.2 has just been released with #1274 reintroduced, but with some additional analysis to pick up Suppliers etc.

Let me know if it causes any problems for your codebase.

Stephan202 commented 2 months ago

Very cool! The automated upgrade PR should land tomorrow; I'll keep an eye on it :)

Stephan202 commented 2 months ago

Alright, now that PicnicSupermarket/error-prone-support#1313 landed I had a look. On our code base the number of generated mutants went down again, causing a lower "test strength" to be reported. The reproduction case in this issue's first message applies again: for CanonicalAnnotationSyntax the number of mutants went back down to three. I guess for this case it may be hard to infer that the methods are BiFunctions? :thinking:

If "functional types" such as Supplier are special-cased, it may be nice to make that set extensible. For example, in our code base we define a lot of static com.google.errorprone.matchers.Matcher (subtype) instances.

Alternatively: perhaps the feature as a whole could be toggled? (But I can see that's not nice from a UX/API point of view.)

hcoles commented 1 month ago

Yes, it can't currently cope with collections of function types as the contained type is lost due to type erasure on call. Thinking about this again, it ought to be possible to fix that with a pre-scan of class fields to collect the declarations.

That would just leave the problem of the special-cased types. In theory that could be (largely) solved by scanning the classpath for anything annotated with FunctionalInterface, although that may cause some issues.

I'll take another look at this and see if things can be improved with a 2nd iteration.

hcoles commented 1 month ago

I had a play with this yesterday and made some significant improvements. Unfortunately, scanning the project classpath (ie all dependencies) for function classes looks like it would cause a performance hit of several seconds. This is insignificant for a full mutation analysis, but is not acceptable when running against diffs.

So, it looks like it will need to stick to a fixed list of classes.

I'm still undecided on whether to scan the user code. This is less of a performance hit, but may be confusing when the same class acts differently when used as a dependency vs directly.

hcoles commented 1 month ago

@Stephan202 1.16.3 is releasing now and ought to renable the majority of mutants.

After some consideration I went with the simplest possible approach, with no scanning and a fixed list of classes.

Stephan202 commented 1 month ago

Tnx for working on this @hcoles! I can report that version 1.16.3 generates almost the same set of mutants as version 1.16.1. I documented a small regression here. IIUC the first point documented there is "expected", give the trade-off you describe. Not sure about the second point.