sixty-north / cosmic-ray

Mutation testing for Python
MIT License
556 stars 54 forks source link

Could a fail-threshold be configured? #535

Closed pythonclimber closed 1 year ago

pythonclimber commented 1 year ago

I'm working on adding mutation testing into my team's build and deploy pipelines using cosmic-ray. Have you ever considered adding a fail-threshold property into the configurations so that the 'cosmic-ray exec' command could be configured to fail if the number of surviving mutants was above the threshold? Right now my team will be using the generated HTML reports for informational purposes when we run mutations tests, but I'd be interested in allowing a predetermined number of surviving mutants to be cause to fail a deployment.

abingham commented 1 year ago

I haven't thought about that, but it seems like a really reasonable and useful thing to be able to do. I'd like to avoid adding that (or really any) feature directly to the exec command per se, though. So I see two options for doing what you want to do.

The first is something I think you could do today. You can periodically check the status of the execution using e.g. cr-rate (or perhaps some similar tool of your own devising). If the run has passed whatever threshold you've defined, you can kill the exec command. If everything works as I've designed it to, this should "just work", though I recognize that it adds a bit of complication to your pipeline.

The second option is a bit more speculative. We could define a plugin system where plugins get triggered at various phases in the execution, e.g. pre-test, post-test, etc. This isn't a huge amount of work I think (famous last words!), at least for a minimal version, but it is some work which I may not be able to get to any time soon.

Does the first option sound practical for you?

pythonclimber commented 1 year ago

I think option 1 will work for us. I need to tinker with cr-rate and make sure I understand it fully, but this definitely seems like a way to achieve our goal.

I'm intrigued by option 2. It kind of brings to mind the phases of the maven lifecycle in Java which I've always found to be a very useful system.

pythonclimber commented 1 year ago

The cr-rate tool worked perfectly (especially with the --fail-over option). The only (extremely minor) criticism I have of this approach is that it requires me to specify the fail threshold by percentage rather than count. Thank you for the help on this.

mbj commented 1 year ago

@rubyclimber While I'm not a cosmic-ray user, I'm also doing mutation testing tooling and would like to add another perspective.

If you where to specify a partial threshold of lets say: 99% coverage. You allow 1% uncoverage. The problem is that this 1% can "wander around".

Today it may be method A, tomrorow method B. And worst case its always where you currently work, giving you a false sense of security.

What I found to be superior is to request 100% from the tool, and than for methods you know that are not covered: Exclude them from analysis.

This way the uncoverage cannot silently drift, and you have a clear todo list in minimizing the excludes.

pythonclimber commented 1 year ago

@mbj I think there is a misunderstanding on exactly what I'm trying to accomplish. I'm not looking to create a wandering uncovered portion of my code. I'm trying to find a way to make the mutation test run fail when new surviving mutants are discovered, so that my team can address quickly.

abingham commented 1 year ago

I'm intrigued by option 2. It kind of brings to mind the phases of the maven lifecycle in Java which I've always found to be a very useful system.

Yeah, I'm definitely taking inspiration from lifecycle plugin systems like pytest, vscode, etc. I want to think about other use cases...defining APIs like this can be like shooting in the dark unless you know what you're trying to support. At the same time, defining a rough initial API may inspire people to come up with those use cases.

abingham commented 1 year ago

The cr-rate tool worked perfectly (especially with the --fail-over option). The only (extremely minor) criticism I have of this approach is that it requires me to specify the fail threshold by percentage rather than count.

If you look at the implementation of cr-rate in src/cosmic_ray/tools/survival_rate.py, you'll see that it's pretty dead simple. It should be very straightforward to create your own tool which does precisely what you need. I'd be happy to give whatever guidance or feedback I can if you want to give that a try.