sixty-north / cosmic-ray

Mutation testing for Python
MIT License
565 stars 57 forks source link

System for exceptions #97

Closed abingham closed 3 years ago

abingham commented 9 years ago

In some cases we'll find that surviving mutations are completely acceptable. Consider ways to allow users to add exceptions.


abingham commented 9 years ago

A reasonable approach might be to let users provide an exceptions list of some sort. They would specify the line number or something (though this is brittle.) Then we would simply ignore survival results for that location.

This isn't as robust as embedding exceptions in the code itself, but it also doesn't force people to pollute their code with exception notes.


Original comment by: austin_bingham

ghost commented 8 years ago

Specifying whitelisted code statements would be helpful, it always likes to find survivors in __name__ == "__main__".

Also, reusing comments designed for other systems may work as well, commonly ignored code may have 'pragma: no cover' annotated for instance.

abingham commented 8 years ago

Hehehe...that's a really interesting one I hadn't thought about! That really would be nearly impossible to test for.

The main technical challenge with a code-based whitelist is that, for the most part, cosmic ray doesn't actually deal with the code at all. It works on the AST, so I'm not immediately sure how we could correlate code snippets with AST nodes in a general way. Maybe it's easier than I think, at least for some specific cases like this one.

ghost commented 8 years ago

For this case at least, it works pretty well. ast.parse('__name__ == "__main__"') just ends up giving back an AST expression with name, eq, "__main__" since it's a valid Python expression on it's own.

As a first effort, only allowing valid Python in the blacklist would still help a lot but give direct comparisons to the AST before modifying it.

If you can somehow make "pass" generic, you could go even further and accept things like if __name__ == '__main__': pass, but it would be a more complicated AST comparison. It kind of reminds me of overriding functions to help pylint work around metaprogramming. I know it uses astroid heavily, so maybe that can help with some of the harder problems with ASTs.

abingham commented 8 years ago

Astroid is definitely on my "things to look at" list, but it hasn't risen to the top yet. Perhaps we can completely reuse pylints black/whitelisting system; that would certainly be very convenient!

In all honesty, this isn't going to be my focus for quite a while. My main goal, when and if I have serious time to work on CR, is to get concurrency working better. So if you've got time and/or ideas for this area, I'd be more than happy to help you get them implemented.

ghost commented 8 years ago

I actually had a few ideas around that, but wasn't sure where to put it.

Most of the test cases for me seem to be incompetent (timeout) which isn't too helpful, but for the killed ones, I suspect many will fail the same tests. It would probably require a rethink of the test plugin system, but hinting which tests to run first could help killed tests fail fast for the same properties.

Maybe forcing the same AST location to run on a single process and abusing the pytest flag --ff (although I'm not sure where it stores that state).

Another idea is tracking runtime per test, so if a single test completes in 5ms during the healthy test run, the whole run can be killed if it takes 15ms (baseline=3) rather than waiting for several seconds that the entire test suite timeout may take. I'm not sure unittest or py.test give that much information easily, but it could greatly speed up timeouts.

I'm happy to help with the speed improvements and reporting when I find some time, but I'm not a great programmer, so I probably won't be joining you in the AST stuff.

abingham commented 8 years ago

After a discussion with @rob-smallshire and @DRMacIver at ACCU, it might not be so difficult to have an external file containing processing instructions, exceptions, etc. The gist of the idea was that each entry would include filename, line number, and the line of code in question along with one or two surrounding lines for context. With this information we could scan the code reliably tell when and how the exception file had become stale, prompting the user to update things. The hope is that it would become stale infrequently enough to not be a real burden.

Based on this info we could construct some table of exceptions, and that would be used to trim the work order constructed during initialization.

abingham commented 7 years ago

I've started some work on this in the spor branch. This is based on the spor project which is going to be an interesting technical challenge in its own right.

abingham commented 6 years ago

@rob-smallshire had the excellent idea that we could anchor exceptions using language constructs rather than source lines. So you could say "don't run operator X for module.submodule.funcA" and so forth.

The same idea is used in pytest where you can select your tests based on "node id". A good first approximation of this technique for CR would be to steal their syntax, etc.

abingham commented 6 years ago

anchor exceptions using language constructs

After thinking about this a bit, it has some pretty substantial shortcomings. The biggest one is that it means we can't anchor exceptions to things with no obvious language construct. For example, suppose I had a class-level "constant":

class Foo:
    CONST = 42

How could I ask CR to not mutate the 42? I could imagine an address like "module/Foo.py::Foo::CONST", but I worry that the logic for this would get pretty hairy. In this case, we'd have to know to look for assignment AST nodes and check their targets for the name "CONST".

boxed commented 5 years ago

mutmut supports the "# pragma: no mutate" option that @xiaclo alludes to above. I think it would be great if Cosmic Ray and Mutpy supported this too so we can have a common standard.

boxed commented 5 years ago

Also see the discussion on more advanced exclusion rules here: https://github.com/boxed/mutmut/issues/47 and https://github.com/boxed/mutmut/issues/28

abingham commented 5 years ago

I'm not satisfied with that "embedded comment" style of directives. Getting it to do anything beyond a simple "don't mutate this line" is going to get ugly fast, and makes me think of people trying to use XML as a programming language. The approach we're working on with spor will, I think, be much more powerful, less intrusive, and (selfishly) far more interesting to work on.

boxed commented 5 years ago

I agree, but for the simple case it's pretty ok and very simple and straight forward. I hadn't seen spor before (I saw you talk about it but didn't manage to find the repo before). This seems pretty interesting!

Do you have support for mapping some metadata to the entire directory with spor? Because that's mostly what we've been discussing with mutmut: to have global patterns for exclude.

boxed commented 5 years ago

Ah, I should also mention that we've been using the pragma in practice with mutmut in quite a few projects where we've achieved zero mutants (not counting excludes) and in practice it works surprisingly well we've found. So while I agree on the theoretic approach, in practice you can get probably 90% of the way with just the pragma.

boxed commented 5 years ago

Hmm.. I had an idea just now: what if spor could import from pragmas? Then you could have a good argument for using spor for coverage.py. Seems like it should be simple to implement and gives a logical migration path.

abingham commented 5 years ago

Do you have support for mapping some metadata to the entire directory with spor?

Not right now, and I've only really given idle thought to it. I think it would mean introducing a new kind of "anchor" since the current anchors use file content as context for updating. I've added an issue to spor for this.

abingham commented 5 years ago

what if spor could import from pragmas?

I imagine it would be straightforward to write a pragma-to-spor converter. I'd even consider adding something like this to the spor repository, but my instinct is that it should be a standalone tool.

boxed commented 5 years ago

I'd say you're overthinking it :)

abingham commented 5 years ago

How so?

boxed commented 5 years ago

It's like a 15 lines script, better to include it instead of requiring users to install yet another dependency.

abingham commented 5 years ago

Right, but my objection to is that, currently, spor doesn't really know anything about "Python code", so-to-speak. It's written in Python (though there's an experimental rust implementation), but beyond that it's intended to be language agnostic. So yes, it likely won't hurt anyone to have a Python-specific tool in spor, I want to avoid any inclination to make spor into a "Python tool".

boxed commented 5 years ago

Oh. Yea ok. Still though, you could include various mini tools to try to drive adoption...

boxed commented 4 years ago

I implemented a (quite hacky) way to plug into mutmut. You can see a practical use at https://github.com/TriOptima/iommi/blob/master/mutmut_config.py It is very successful. This little config makes mutation testing of iommi pretty fast, as compared to unfeasible without it.

It would be nice if we would agree on a common API (maybe with specific extensions namespaced so we can still have differentiation before we standardize). I would love your feedback on this.

abingham commented 4 years ago

If I understand what you've got, you're skipping mutations based on whether they've got tests associated with them or if certain decorators are applied. Is that right?

I think we could find an easy standardization around a sort of ad hoc predicate API. Users could specify a function that answer "mutate" or "do not mutate" for a given (source file, line number). CR and mutmut could then provide facilities for using such functions to filter mutations in a cross-tool way.

It's interesting to think about other parameters we could pass to this predicate. An obvious one would be the name of the mutation operator so that user could prevent certain kinds of mutations. Another would be the range of text being mutated in the, supporting even more fine-grained control. This might be overkill for a simple, broadly applicable API, though. I suspect most users would be what they need from a (file, line) based approach.

boxed commented 4 years ago

If I understand what you've got, you're skipping mutations based on whether they've got tests associated with them or if certain decorators are applied. Is that right?

Sort of. I am skipping certain decorators (or trying to, see below) and I am setting the entire test command based on the file I'm in. It's pretty rough, but it's a HUGE improvement in speed and the accuracy is pretty ok.

Users could specify a function that answer "mutate" or "do not mutate" for a given (source file, line number)

I think the more important point is that this API should be able to change what test command is run. This is the big improvement here. It takes our test suite of ~9s and cuts it down to ~1s (of which almost all is the pytest overhead, again more on that later!).

CR and mutmut could then provide facilities for using such functions to filter mutations in a cross-tool way.

I think this would be awesome for both our projects and would make mutation testing seem much more production ready :)

An obvious one would be the name of the mutation operator so that user could prevent certain kinds of mutations.

That is a good idea!

Another would be the range of text being mutated in the, supporting even more fine-grained control. This might be overkill for a simple, broadly applicable API, though. I suspect most users would be what they need from a (file, line) based approach.

I assumed so too, but I've already hit the limit of the line based approach! I will definitely implement a way to get the relevant section of code I think, this might be smaller than a line or bigger than a line. And the range too I guess...

All this work with mutmut has made it even more obvious that pytest really is a liability for mutation testing. This applies to you and me. If I select the perfect subset of tests in zero time I am still absolutely destroyed by the pytest overhead. This is largely irrelevant for normal use, but for mutation testing this is unacceptable.

I've been thinking about this for several days now and I'm starting to think I'm gonna have to write a new test runner. This seems a bit extreme but I don't see how I have another option in order to move forward :( I will not accept going back to unittest, it's just not nice enough to work with imo.

abingham commented 4 years ago

I am setting the entire test command based on the file I'm in

That's an interesting idea. CR would sorta support this by using multiple sessions, I think, but it's not a feature I've really thought about. I can imagine ways to support it more directly, but I need to think about it.

I am still absolutely destroyed by the pytest overhead.

I wonder if this is related to pkg_resources. Importing it is notoriously slow; this is a problem we face at Sixty North with a number of our programs that use plugins. So many people are experiencing this problem that I suspect it'll be fixed at some point.

write a new test runner.

That is a big step! It's not a step I will take for CR, I think...I really want it to work non-intrusively for as many projects as possible.

boxed commented 4 years ago

Yea I've done some work on speeding up pytest but the plug in system is suuuper slow as you say. I'm not sure anyone is looking though.

It is a big step I agree but I don't see a way to make mutation testing actually good. And there are some big advantages to having a test runner that prioritized speed (and thus enables mutation testing). At the very least it can supply a pressure on pytest so they have to take speed more serious :)

abingham commented 3 years ago

I'm going to consider this solved, with the solution being filters. We have fairly sophisticated support for filtering/skipping tests based on essentially arbitrary criteria. See e.g. the cr-filter-pragma tool.