sixty-north / cosmic-ray

Mutation testing for Python
MIT License
565 stars 57 forks source link

Reduce runtime by running only module related tests #498

Open lvdstam opened 4 years ago

lvdstam commented 4 years ago

Assuming that there is some relation between the path and name of a code and test module, and assuming that only tests related to the code module being mutated need to be executed, a significant reduction in the runtime could be achieved. This behavior should be optional and configurable allowing for instance to associate multiple test modules with one code module or vice versa. Nothing to complex though. Simplicity should be key.

With some guidance, I would be willing to create a merge request for such a feature.

boxed commented 4 years ago

This seems like a test runner feature, not a feature of a mutation tester. Or so I thought when I implemented this exact feature into hammett (my test runner) to improve mutation testing speed for iommi (a lib we're writing at work), with mutmut (my mutation tester) :)

This feature should be usable via cosmic ray just as it is via mutmut.

lvdstam commented 4 years ago

Sure it can be pushed down the stack. It seems to me that this feature would be beneficial regardless of the testrunner and therefore well suited for the mutation test driver...

Groetjes,

Louis

On 16 Jun 2020, at 16:22, Anders Hovmöller notifications@github.com wrote:

 This seems like a test runner feature, not a feature of a mutation tester. Or so I thought when I implemented this exact feature into hammett (my test runner) to improve mutation testing speed for iommi (a lib we're writing at work), with mutmut (my mutation tester) :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

boxed commented 4 years ago

The problem with putting it at the mutation tester level is that there really is no common API for test runners to get the required information from the runner, nor are there APIs to tell test runners what tests to run.

In fact pytest has made it intentionally harder to use it as a library!

lvdstam commented 4 years ago

I don’t know all test tools, but the one I know allow me to specify which file to run the tests from. That’s all that’s needed isn’t it?

Groetjes,

Louis

On 16 Jun 2020, at 17:00, Anders Hovmöller notifications@github.com wrote:

 The problem with putting it at the mutation tester level is that there really is no common API for test runners to get the required information from the runner, nor are there APIs to tell test runners what tests to run.

In fact pytest has made it intentionally harder to use it as a library!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

abingham commented 4 years ago

This is a good idea, and it's a subset of larger principles we've discussed in other issues, e.g. #85 and #484.

The best way to do it is with a filter. This gives you the ability to mark arbitrary mutations/tests as "skipped" prior to execution (or really, whenever you want). I suspect it would be very easy to write a filter that takes the name of the modified module and skips everything that isn't related to it.

You can find a number of filters, as well as a base class to simplify their creation in common cases, in the source.

boxed commented 4 years ago

Ah, yea you only mean on the entire test file level, then yes, I believe it's fairly safe/easy to implement. We used to do this on iommi (for mutmut but the principle is the same) in fact: https://github.com/TriOptima/iommi/blob/634bccf9e91bc5006ab2f95a368e1341a7bf5287/mutmut_config.py

Since then I've implemented this module-test coupling feature directly into my test runner hammett, and we now use that locally. The way we do it is we put the tests right next to the module, but with a __tests.py suffix (like this: https://github.com/TriOptima/iommi/tree/master/iommi). Works very well. I don't need to have any special behavior in mutmut to do this, so if you used hammett with CR you would get the same benefit.

Also, hammett has way lower overhead than pytest and is somewhat pytest compatible so you can get some pretty big wins doing mutation testing with it. Again, this applies just as much to CR or mutpy as to mutmut.

In mutmut I have an extra special code path for hammett to get even more speed, where hammett is written to be used as an API and unloads modules, and mutmut runs 100 mutants in a single process and then you get a new process. I couldn't get django to not leak memory like crazy so that's why it's not just one process for all mutants, but hitting the process creation overhead every 100 mutants makes the overhead pretty much disappear right now as there are other things that are more expensive anyway :(

lvdstam commented 4 years ago

I have just now looked at the mutmut configuration feature. It seems to me that what you describe there should work if in this pre test callback the driver also provides information about the mutation being tested (module file name and path...). For me mutmut and cosmic ray are on the same level in the stack.

I also thought about using coverage information for this purpose allowing even bigger reductions. It seemed to me that there might be a risk that due to a mutation the code path changes which in turn makes another test also relevant. And have therefor discarded this option.

Thinking a bit more about it, the mutmut way gives a lot of freedom to use whatever kind of naming convention to indicate the relation between the module and the tests.

I would very much like to add support for such a feature to cosmic ray, because it already supports utilizing all cores of the available hardware...

abingham commented 4 years ago

I would very much like to add support for such a feature to cosmic ray

So again, I would recommend a filter. You can define whatever relationship you want between modules and tests, implement that in a filter, and everything works. All the parts you need are there.

boxed commented 4 years ago

@lvdstam we don't have the same mutants though, so don't be fooled into thinking that's the only difference between the tools! Also, you can sometimes get 10x or 100x improvements in speed with mutmut plus the hammett special case, so just using more cores might not be faster at all. Don't assume it does.

Which reminds me.. we should discuss naming and classifying mutants so we can compare the tools. We have discussed this before a bit but never got anywhere.

lvdstam commented 4 years ago

@abingham Looking at the filter suggestion, I don't understand the idea. My understanding of the filter is that it runs after the init and removes some of the mutations from the pool of mutations to test. I'm not clear on how to use the filter for limiting the tests to execute for each mutation. Can you please elaborate on this.

lvdstam commented 4 years ago

@boxed I have noticed that the set of mutations is indeed different, but that seems solvable with some analysis and additional plugins.

Looking at hammett, I have difficulty in understanding what it does, what are limitations... Besides mutmut with hammett getting into a situation where no progress is made (process seem to have died) on our tests, the above (not understanding) is my main reason for looking at other tools/ways to get the job done.

abingham commented 4 years ago

I'm not clear on how to use the filter for limiting the tests to execute for each mutation

Ah, I see. I was reading this incorrectly. As you say, filters will not let you control which tests are run. The only way to do this is by changing the test command you use, assuming your test framework gives you the ability to do what you're looking for.

If you have a true correspondence between tests and modules under test, it might be simplest to split your mutation testing up along these lines. You could have a single session per module (or test, or whatever works best for you). I'd be curious to know how this approach works out in practice.

abingham commented 4 years ago

Related to this (and what I was initially thinking about) is that filters can let you skip mutations to locations that don't need to be mutated. That is, if you've only changed module X, you can filter out mutations to all other modules. This can have a huge impact on runtime, and should be done independent of whether you find a way to limit the tests being run.

lvdstam commented 4 years ago

@abingham I agree with you on that last bit. To get a complete verdict however, you need to have the results of the previous run available. I'm mainly thinking about running the mutation tests in the Ci/CD pipeline where a full run would make more sense. For a local developer experience, the feature you describe is very desirable.

lvdstam commented 4 years ago

So what are your thoughts on this feature and implementation? Does the feature make sense? And if so would a mutmut style interface with two callbacks make sense?

class TestsuiteSelector(abc.ABC):
    @abc.abstractmethod
    def initialize(self) -> None:
        """
        This function is called once during a exec before any mutation is made or test is run.
        """"
        pass

    @abc.abstractmethod
    def get_tests_to_execute(self, mutation_info: Dict[str, Any]) -> str:
        """
        This function returns the name(s) of the test modules for the current mutation.
        mutation_info: Describes the current mutation and as a minimum contains the filename and
        path where the current mutation is being made.
        """
        pass

For the mutation_info I have proposed a dictionary, but I would actually prefer a more typed interface like a class.

abingham commented 4 years ago

I'd be happier with a more extrinsic solution that didn't involve new APIs inside CR, something closer to what hammet sounds like. So instead of telling CR to invoke pytest directly, you'd have it invoke some new test runner (let's call it "tufnel" to keep the gag going). Tufnel would be given enough information to know how to drive pytest (or whatever test runner you really want to use). So the business of running tests would be outside of CR, and tufnel (and related tools) could be developed completely separate from CR.

As things stand, this would still require at least a minor change to CR. We'd have to communicate information about the module being mutated to tufnel. The most direct way to do this would be to let test runner commands have replacement fields that CR can fill in with that information. So your test runner command might look like:

tufnel {module_path} -x tests

CR would fill in the module_path, and tufnel would arrange for the correct pytest invocation. Would something like this do what you need?

boxed commented 4 years ago

With hammett there doesn't need to be any communication (you can modify the command like before if you want but you shouldn't need to). Hammett checks the modification times of files to know what modules are changed, and then it knows what test modules to run.

I'm sorry to hear there's a hang. There was some fix in mutmut a while back, I haven't released a new version since then. If it's not a big inconvenience it would be great if you could try the master branch directly.

But you could run the test suite with hammett?

lvdstam commented 4 years ago

@abingham I like that idea. Not sure that from a performance viewpoint creating another process in between is a smart thing. I guess one would like to batch this, but given that the driver mutates the code... Not so sure....

lvdstam commented 4 years ago

@boxed with mutmut I could run the tests. In my current environment which is clean from the repo, if I just pip install hammett and run hammett -x no tests are executed...

When time allowd, I will install the latest master version of mutmut and redo.

abingham commented 4 years ago

You're going to get a subprocess for every test run as it is; that's how CR runs tests. If this new test runner runs pytest in-process (which is entirely possible, and maybe ideal for something like this), the only new runtime hit will be in calculating the tests to run.

abingham commented 4 years ago

mutmut I could run the tests

Not to be rude, but could you move the mutmut support issues to mutmut?

lvdstam commented 4 years ago

@abingham Could you give me a pointer towards running pytest in-process. Seems a perfectly reasonable solution then...

Sure. Will not report on mutmut here.

abingham commented 4 years ago

I think this is what you need.

CR used to run tests in-process, and I'm 99% sure this is how we did it for pytest.

lvdstam commented 4 years ago

But using the pytest API makes it pytest only. Could be done, but I would prefer a more generic way?

abingham commented 4 years ago

Could be done, but I would prefer a more generic way?

I'm completely fine with a totally generic solution, but I don't know what that solution is, and I don't think it needs to be in CR. CR used to have a plugin-based system for test runners, and it ran everything in-process, so maybe you could do something like that.

lvdstam commented 4 years ago

Okay. Sounds like we need to take some time to think it over. Maybe I'll just fork and see what happens. Could you give me a global hint as to where to add the call to "tufnel" and where to get the module_path parameter. Making some changes to the code might just help me get a better understanding...

abingham commented 4 years ago

hint as to where to add the call to "tufnel"

You would specify that in the test command in your configuration, i.e. the file you pass to the init command.

where to get the module_path parameter.

I think the correct place to get this is in mutate_and_test(). Here you would construct your test command with whatever information you want, and then the command is passed to run_tests() to be executed.

Edit: Actually, Worker.execute() might be a better place. This is one step before mutate_and_test() and has access to more information.

boxed commented 4 years ago

In process is pretty dangerous though. I found out that the hard way. That's why I run just 100 runs before I cycle the process. At least pytest-django (probably django) isn't amenable to being reloaded, and pytest isn't really possible to unload cleanly either (another reason I gave up and wrote hammett).

Just wanted to put my experience here so you don't waste all that time I wasted!

boxed commented 4 years ago

@lvdstam could you open issues for hammett and/or mutmut so we can continue discussing that there? You don't have an email-address so can't contact you directly.

Sorry about this last off topic message Austin. (I wish there was private messages on github!)

abingham commented 4 years ago

In process is pretty dangerous though.

In-process is a non-starter if you expect to be able to run more than one test run. But in this case we'll be running only a single test run per process.

boxed commented 4 years ago

Well, I actually accomplish more than one test run per process. It was painful, I'm pretty sure I broke windows support and I had to run my own test runner, and cycle the process once in a while, but it's possible. But really we should be doing mutation schemata and then this wouldn't be such a big problem, and we'd get parallelization basically for free. But it's a huge rewrite... at some point I might get to it.

abingham commented 4 years ago

I actually accomplish more than one test run per process

I'm sure it's possible, but since a mutation testing run could cause arbitrary damage to a runtime environment, it seems impractical. CR takes the approach of using processes as firewalls against potential corruption.

lvdstam commented 4 years ago

@abingham Thanks for the hints. I'll go and run with it for a bit.