Use coverage data to decide which functions to mutate and which tests to run

sourcefrog commented 2 years ago

Discussed in https://github.com/sourcefrog/cargo-mutants/discussions/23

^{Originally posted by **xd009642** February 13, 2022} So this may be worth creating an issue for, but this is largely an idle thought I had yesterday. Mutation testing improves on things like coverage by making sure the tests are actually useful and not just hitting a ton of lines/conditions but not checking any of the values. Now if we already have code coverage results for our tests and can see some functions aren't tested at all we could save time by not applying mutations to them - after all none of the mutations would be caught. For maximum usability this should probably take the form of accepting an optional argument of some open coverage format like an lcov report or cobertura.xml which a number of coverage tools already output.

Yeah, good idea. mutagen optionally does something live this according to its documentation, but I have not looked at the implementation.

We could take it a step further by understanding which tests run which function under test. Functions not reached by any test we know are just apparently not tested. Functions that are reached by some tests, we can mutate and then run only the relevant tests. This would potentially be dramatically faster on some trees.

That said, I think there are a few things that might make this annoying to implement reliably, but perhaps my preconceptions are out of date. In the past, getting coverage files out of Rust historically to be a bit platform-dependent and fiddly to set up in my experience. And, historically the output was in a platform-dependent format that required external preprocessing. Both of these are in tension with my goal for a very easy start with cargo-mutants.

However there is now https://blog.rust-lang.org/inside-rust/2020/11/12/source-based-code-coverage.html providing -Z instrument-coverage, which is moving towards stabilization as -C instrument-coverage.

So if this ends up with a way to just directly get a platform-independent coverage representation out of cargo this might be pretty feasible.

Coverage may still raise some edge cases if the test suite starts subprocesses, potentially in different directories, as both cargo-mutants and cargo-tarpaulin seem to do. Will we still collect all the aggregate coverage info? But, we could still offer it for trees where it does work well. And maybe it will be fine.

There might also be a hairy bit about mapping from a function name back to the right cargo test invocation to hit it. But that also can probably be done: if nothing else perhaps by just running the test binary directly...

Possibly this could be done with https://github.com/taiki-e/cargo-llvm-cov

xd009642 commented 2 years ago

So some thoughts:

Coverage stats done per test would currently need a run per test, which would likely be slower. There might be a way to do it by implementing a custom test runner and implementing the profiler built-ins to get the stats with a single threaded runner but that feels like a lot of extra work and another tool entirely. It could be worked out statically to a degree but things like dynamic dispatch/generics etc make this also pretty complex.

With -C instrument-coverage and processes being spawned you'll end up with multiple profdata files that need merging. Without specifying the naming pattern for the output file to have something like pid or timestamp added this will lead to spawned processes removing each others results files.

Tarpaulin is working on adding -C instrument-coverage support as an alternative collector which will handle some of these difficulties for users which they may not be aware of and generating different reporting formats and working with 3rd party reporting tools like coveralls & codecov.

Generally, I think keeping the collection of coverage stats to the users and using that to filter the mutagens is a good first step. That means it can be plugged into existing setups that may use: grcov, kcov, tarpaulin, cargo-llvm-cov, -C code-coverage or GNATcoverage. The majority of these tools have either lcov, cobertura reports or both (I think GNATcoverage is an outlier in terms of this but haven't used it personally). And for lcov there is a parsing library already https://crates.io/crates/lcov

Unfortunately, I think the option to make it an easy "always on" thing won't work for a large amount of projects that would need bespoke setup. As things start to stabilise and grow in maturity this should be possible, but I think there'll always be a selection of users that have less conventional coverage needs. And for these users being able to provide a pre-generated coverage report to cargo-mutants would probably be the preferred UX.

One example of something I've been planning in is on-device embedded coverage using probe-rs or an embedded no_std llvm profiler runtime and the embedded rust defmt tools. I wouldn't expect this to make it into the rust compiler at any point as it feels too bespoke, but if I got this working then using cargo-mutants on embedded projects would be pretty cool.

Just my 2¢ :grin:

sourcefrog commented 2 years ago

Thanks @xd009642.

I can't currently think of any practical way to do this, so I'm going to close the bug for now.

sourcefrog commented 10 months ago

I thought about this some more after adding nextest support (#85), which does run one test at a time (more or less) and so would be a foundation for collecting coverage one test at a time.

I agree that it seems like getting coverage working well on any tree seems a bit fiddly today, so this might be hard to make work out of the box.

For the case originally suggested, of just entirely skipping uncovered code, it seems like the best thing would be for users to either add tests for that code, or manually mark it skipped in cargo-mutants. However, perhaps they want to parallelize working towards better tests using both coverage and mutants, rather than one after the other.

Skipping spans

I think it could make sense to have an option like --skip-spans, that avoids generating any mutants for the specified line-col ranges. (It's basically the inverse of --in-diff.) Then you could potentially feed that from coverage output. It seems like sometimes the mapping of coverage to source location is a bit noisy and heuristic, but this would at least approximately suppress most mutants from uncovered code.

Also, if this just accepted a format-independent list of {file, start: (line, col), end: (line, col)} then people could convert from whatever coverage or other format they have. We could later, as a convenience, accept some well-known formats.

Accepting test->span maps

If we do run one test at a time, perhaps using nextest, and they emit coverage, then we can collect a map from test name to lines covered by that test. (Again, with the caveat that the coverage data is not 100% exact, and that some kinds of test might not collect coverage well.)

By inverting this map we could see which tests could potentially catch a bug in some given line, and then run only those tests. For very large crates this might give a significant improvement in performance, especially if they already expect to be tested under Nextest and so already pay the one-test-at-a-time performance cost.

xd009642 commented 5 months ago

So just a small comment on some playing around with ideas in this area I'm working on, recently I overhauled tarpaulin's reporting to better get function/method names and do so in a way that matches cargo-mutants, then generated an lcov coverage report as that currently has function names, grab all the functions with 0 hits and put them in a .cargo/mutants.toml exclude_re field and that successfully filtered out mutations for functions that were untested. So I do have a workable version of this feature given a bit of script glue between tarpaulin and cargo-mutants

Abridged version of the lcov coverage report

TN:
SF:/home/xd009642/personal/tarpaulin/meta/mutants_tester/src/lib.rs
FN:4,add
FN:19,Foo::five
FN:33,<impl Shiterator for Marker>::next
FN:39,<impl Foo for Marker>::four
FN:45,<impl Foo2 for Marker>::five
FN:51,<impl Display for Wrapper<T>>::fmt
FN:57,<impl Display for Wrapper<T>>::boo
FN:64,Wrapper<T>::unwrap
FN:70,Marker::marked
FN:76,nonsense
FN:77,nonsense::inner
FN:91,tests::it_works
FNF:12
FNDA:1,add
FNDA:0,Foo::five
FNDA:0,<impl Shiterator for Marker>::next
FNDA:0,<impl Foo for Marker>::four
FNDA:0,<impl Foo2 for Marker>::five
FNDA:0,<impl Display for Wrapper<T>>::fmt
FNDA:0,<impl Display for Wrapper<T>>::boo
FNDA:0,Wrapper<T>::unwrap
FNDA:0,Marker::marked
FNDA:0,nonsense
FNDA:0,nonsense::inner
FNDA:1,tests::it_works

Generated mutants.toml

exclude_re = [
    "Foo::five", 
    "<impl Shiterator for Marker>::next", 
    "<impl Foo for Marker>::four",
    "<impl Foo2 for Marker>::five",
    "<impl Display for Wrapper<T>>::fmt",
    "<impl Display for Wrapper<T>>::boo",
    "Wrapper<T>::unwrap",
    "Marker::marked",
    "nonsense",
    "nonsense::inner"
    ]

sourcefrog / cargo-mutants