Add coverage analysis to offer improved performance

rouke-broersma commented 5 years ago

This issue follows from #183

Generate mutant coverage (which mutants is actually executed by any test). This could provide useful metrics as of now.

Offer to skip test runs for non covered mutants (hence assumed as survivors) ==> Improve performance for projects with low test coverage. Identify tests that do not cover any mutant, offer to skip them from each run. ==> Could improve performance for projects with many tests Optimise executed tests according to mutant coverage (i.e. run only the tests that cover the 'current' mutant) ==> Should dramatically improve performance Step 1 requires to able to access coverage results from Stryker Step 2 looks simple Step 3 requires to control which tests are run Step 4 require analysis of coverage data to build strategy

rouke-broersma commented 5 years ago

From dupdop:

Some update: I have namedpipes running smoothly and stable, after a few fights. I have reached maturity level 2: (my) Stryker now assumes non-covered mutants are survivors hence do not test them.

Code available here: https://github.com/dupdob/stryker-net/tree/mutant_coverage

dupdob commented 5 years ago

I have a branch ready, just waiting to finish with my current PR. (Branch currently address use cases 1 and 2)

rouke-broersma commented 5 years ago

@dupdob Yes I've seen it. The named pipe seems like a good way to implement this. Step 3 can be achieved with the vstest integration.

To achieve step 4 I was thinking we could create a named pipe from the mutated assembly to a vstest data collector (out of proc as well) and then a named pipe from the data collector to stryker. The datacollector can execute for every single test being run and while in the middle we can receive which mutants have been hit for that testrun from the mutated assembly. Then we can send that information including the test that was run to stryker through the other named pipe. And then we can use vstest to run only those unit tests that are relevant to the mutant.

dupdob commented 5 years ago

@Mobrockers thanks for the proposal, I have made an MVP using pipe communication between the assembly and Stryker, the objective reaching a proof of concept before considering improvement strategies. Here is what I have achieved and learned so far. Spoiler alert, it is a bit disappointing (at least for netcore environment, it may be better with classical .Net)..

Overall, VsTest architecture is complex, brittle and prone to failure . I often end up having vstest.console.exe processes eating a core in some endless loop, logging nothing.
I have yet to achieve a complete run with NFluent due to above issues. At some point, Stryker runs out of available vstest runners.
Performance are abysmal. To get coverage info per-test, I launch each test in isolation. It is really slow.
~1% of runs failed to provide coverage data. This is probably due to the fact I use appdomain tear down event to send the coverage data through the named pipe. As of now, this is not the most urgent issue; I assume those test covers ALL mutants and have to run each time

The biggest disappointment for me was the fast that vstest provided little more than a facade against dotnet test...

I have various leads and ideas on how to fix those issues, but the fact is it turned more difficult than anticipated... Coming back to the list:

Stability: No idea, but I assume my code and maybe the vstest stryker runner itself, still requires some clean up and bug fixing. Once code have been hardened and refactored, I hope those issue will disappear
Performance: there is an interesting option (AreTestCaseLevelEventsRequired) that may allow me to capture coverage in a single run (so I have yet to read the doc, if any, about it). There is also the option KeepAlive that looks promising, but this one is documented as not implemented/for future use.
Missing coverage: I hope I will be able to reduce the failure rate, plus I still can devise alternate communication strategies, including real time async, which would at least provide partial coverage in case of failure at tear down.

dupdob commented 5 years ago

OK, went through the code, AreTestCaseLevelEventsRequired does not serve the purpose I hoped fore (sequential notification)

rouke-broersma commented 5 years ago

The biggest disappointment for me was the fast that vstest provided little more than a facade against dotnet test...

How do you mean? Dotnet test internally uses vstest, not the other way around

rouke-broersma commented 5 years ago

Performance are abysmal. To get coverage info per-test, I launch each test in isolation. It is really slow.

~1% of runs failed to provide coverage data. This is probably due to the fact I use appdomain tear down event to send the coverage data through the named pipe. As of now, this is not the most urgent issue; I assume those test covers ALL mutants and have to run each time

Performance: there is an interesting option (AreTestCaseLevelEventsRequired) that may allow me to capture coverage in a single run (so I have yet to read the doc, if any, about it). There is also the option KeepAlive that looks promising, but this one is documented as not implemented/for future use.

I hope using a datacollector can specifically help really much with this problem because of:

https://github.com/Microsoft/vstest-docs/blob/master/docs/extensions/datacollector.md#datacollectionevents

Example: https://github.com/Microsoft/vstest/blob/master/test/TestAssets/SimpleDataCollector/Class1.cs

I understand that it's only an MVP of course, so I get that you're not immediately going for a double named pipe with all it's added complexity.

dupdob commented 5 years ago

Well, I see vstest.console.exe launching dotnet.exe. I guess dotnet integrates some VsTest glue and logic, but from an executable perspective, that's how it works

Command line is:

"C:\Program Files\dotnet\dotnet.exe" exec --runtimeconfig "C:\Users\dupdo\source\repos\NFluent\tests\NFluent.NetStandard.20.Tests\bin\Debug\netcoreapp2.1\NFluent.NetStandard.20.Tests.runtimeconfig.json" --depsfile "C:\Users\dupdo\source\repos\NFluent\tests\NFluent.NetStandard.20.Tests\bin\Debug\netcoreapp2.1\NFluent.NetStandard.20.Tests.deps.json" "C:\Users\dupdo\.nuget\packages\microsoft.testplatform.testhost/15.9.0\lib/netstandard1.5/testhost.dll" --port 55100 --endpoint 127.0.0.1:055100 --role client --parentprocessid 12420 --diag "C:\Users\dupdo\source\repos\NFluent\tests\NFluent.NetStandard.20.Tests\StrykerOutput\2019-03-15.14-48-13\vstest\vstest-log.host.19-03-15_15-29-25_91599_16.txt" --tracelevel 3 --telemetryoptedin false

Ok, I get it now, it launches a dotnet host to execute the test. thanks for challenging my comment

rouke-broersma commented 5 years ago

You are right in that vstest is internally a big can of worms :) It also internally spawns new processes for dotnet.exe, testhost.exe, and sometimes datacollector.exe and other collectors depending on configurations 💃

dupdob commented 5 years ago

Quick summary regarding various steps:

is pretty solid so far. It still fails sometimes, and I fixed a bunch of edge cases/timing issues regarding the pipe mechanism. There may still be some, but I also assume that AppDomain unloading event is unreliable
Is working good, albeit providing only a marginal gain from a performance perspective. But it also provides clear hindsight on non covered code
This where I am pretty stuck at. Launching test one by one is so slow: ~1 hour for the 1660 NFluent's tests. I had problem with coverage sometime failing, so I implemented a retry mechanism, so I should get results now.
Logic has been implemented: I have mutant to tests mapping informations, and I am able to run selected tests only. I need to stabilise step 3 before working on 4.

@Mobrockers : I missed your comment regarding DataCollector, will dig into this, indeed it looks what I need to address the issues I face with step 2!

dupdob commented 5 years ago

I just had an idea I find interesting: I am fascinated by the mutant->tests mapping information and I had a hunch there was value in those. And I may has just found it. We could use coverage information in reports to provide hints. E.G:

if a mutant is not covered, suggestion should be around adding a test or refactoring code
otherwise, suggestion could be to consider reviewing some of the tests that cover the survivor.
and we could even cross information for survivors to identify quick win, i.e. need to improve a single test to kill 5-6 mutants.

I am not sure I am clear enough and that it makes sense to you, but I add it as an idea to be considered at a later time.

rouke-broersma commented 5 years ago

Perfectly clear to me, sounds awesome!

On Sat, Mar 16, 2019, 15:57 Cyrille DUPUYDAUBY notifications@github.com wrote:

I just had an idea I find interesting: I am fascinated by the mutant->tests mapping information and I had a hunch there was value in those. And I may has just found it. We could use coverage information in reports to provide hints. E.G:

if a mutant is not covered, suggestion should be around adding a test or refactoring code

otherwise, suggestion could be to consider reviewing some of the tests that cover the survivor.

and we could even cross information for survivors to identify quick win, i.e. need to improve a single test to kill 5-6 mutants.

I am not sure I am clear enough and that it make sense to you, but I add it as an idea to be considered at a later time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stryker-mutator/stryker-net/issues/388#issuecomment-473538512, or mute the thread https://github.com/notifications/unsubscribe-auth/ADbenDI0H1QABU89DV3UtwYEsHgj4UFIks5vXQZ1gaJpZM4bd_Np .

dupdob commented 5 years ago

Monday update: I had the opportunity to explore and I made a cool discovery: I can use VsTest data collection architecture to forward coverage results to Stryker, which should help avoid the 'double pipe' strategy proposed by @Mobrockers. Still far from a finished features, though

dupdob commented 5 years ago

Quick update: the vstest runner now uses environment variables to transfer coverage data between test assembly and host. It means we no longer have to run tests in isolation to capture coverage. Getting closer to a working version!

richardwerkman commented 5 years ago

Cool! Sounds awesome. Keep up the good work!

rouke-broersma commented 5 years ago

Sounds good!

dupdob commented 5 years ago

I made a run with NFluent, time is 55 minutes versus 135 minutes without optimisation. I just pushed the associated code. I would welcome feedbacks using other projects

dupdob commented 5 years ago

Monday morning status update: still working on this. As environment variables do not seem target to me (performance and available space concerns), I have migrated back to pipes, but this is a constant struggle to make them work properly. Tests deadlock without identifiable cause, debugging is difficult and I can't find a way to log from the InProcDataCollector

rouke-broersma commented 5 years ago

Can you not log to file?

dupdob commented 5 years ago

Thanks for the suggestion

The problem is that you have to coordinate the the consumption of the data: the mutated assembly side (mutantcontrol side) has no clue when tests start and stop, while the collector side knows but has no direct access to coverage data. I need to capture covered mutants and reset the list between tests The pipe is relatively good for that, as it allows synchronous request reply pattern. I can use a file, but it will be slower than environment variables., hence it is my fallback strategy for now.

Envoyé de mon iPhone

Le 1 avr. 2019 à 17:48, Rouke Broersma notifications@github.com a écrit :

Can you not log to file?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

rouke-broersma commented 5 years ago

Sorry for the confusion, I meant in relation to this part

Tests deadlock without identifiable cause, debugging is difficult and I can't find a way to log from the InProcDataCollector

You could debug-log to file from the datacollector no?

dupdob commented 5 years ago

Good news, finally spotted the deadlock issue: a bug in my pipe reading class when receiving an empty message. I can keep on improving my code.

rouke-broersma commented 5 years ago

Awesome :)

On Mon, Apr 1, 2019, 22:25 Cyrille DUPUYDAUBY notifications@github.com wrote:

Good news, finally spotted the deadlock issue: a bug in my pipe reading class when receiving an empty message. I can keep on improving my code.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stryker-mutator/stryker-net/issues/388#issuecomment-478733753, or mute the thread https://github.com/notifications/unsubscribe-auth/ADbenGY2ojb80pfhgr75p5TAsuP0wOaHks5vcmtDgaJpZM4bd_Np .

dupdob commented 5 years ago

Deadlock is gone, but I still have what looks like race conditions when capturing coverage. I think it may relates to this issue with Nunit. If I can confirm this is related to NUnit, I will probably add an option to workaround it until it is fixed on NUnitAdapter side

rouke-broersma commented 5 years ago

I think setting the option mentioned in that nunit issue in the runsettings by default might be an acceptable workaround assuming it fixes the issue. It should not interfere with other testrunners.

On Tue, Apr 2, 2019, 22:33 Cyrille DUPUYDAUBY notifications@github.com wrote:

Deadlock is gone, but I still have what looks like race conditions when capturing coverage. I think it may relates to this issue https://github.com/nunit/nunit3-vs-adapter/issues/560 with Nunit. If I can confirm this is related to NUnit, I will probably add an option to workaround it until it is fixed on NUnitAdapter side

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stryker-mutator/stryker-net/issues/388#issuecomment-479188843, or mute the thread https://github.com/notifications/unsubscribe-auth/ADbenKIFw8R4jZUQ03XjLIzm57jtHhv5ks5vc76vgaJpZM4bd_Np .

dupdob commented 5 years ago

I used it. (Kinda) Interestingly it triggered new deadlock issues. I guess I still have bugs. No worries, I love brainteasers and I am happy to find bugs BEFORE opening a PR.

Note that sequential execution slows test execution.

dupdob commented 5 years ago

see #506

rouke-broersma commented 5 years ago

Implemented by #506

stryker-mutator / stryker-net

Add coverage analysis to offer improved performance #388