stryker-mutator / stryker-net

Mutation testing for .NET core and .NET framework!
https://stryker-mutator.io
Apache License 2.0
1.75k stars 176 forks source link

VStest.console.exe hangs on test discovery when using sdk style csproj targeting net48 #2442

Closed MischaVreeburg closed 1 week ago

MischaVreeburg commented 1 year ago

I have some trouble running Stryker. It looks like the during/after the vstest test discovery phase, the separate vstest processes don't return . It looks like a deadlock situation. The solution on which I'm trying to use Stryker consists of 88 projects of which 15 are unit test projects.

The projects are all sdk style projects, using C# version 9+ and targeting net48.

Maybe it has something to do with the projects being sdk style, but targeting net48. I saw that when the testhost.net48.exe was started along side the vstest.console.exe, the test discovery when well, and the callback was within minutes, and most of the time in just a few seconds, but if the testhost.net48.exe process wasn't started the vstest.console.exe waited like forever after the message "vstest.console.exe, TestRequestManager.DiscoverTests: Discovery tests completed." in the TestDiscoverer_VsTest-log.txt files.

After some investigation, it seems that vstest.console.exe has some trouble with sdk style projects, that are targeted to net48. When I have a project which is reference by one test project, vstest.console.exe hangs after test discovery. It seems that the discovery complete message is never send/received. When a project is reference by 2 or more test projects, vstest.console.exe starts for each test project a testhost.net48.exe. These processes correctly end and signal completion.

Furthermore the WaitEnd method on DiscoveryEventHandler is only called after the discovery process has finished. Not during the discovery phase. The While loop in this method is never executed.

A workaround is making sure that all projects are referenced at least by 2 test projects. But this won't work when using the since option, as then only the changed test or the tests that cover the code changes are located in only one test project.

This occurs for instance when adding a new test project after the cutoff value for the since. Then stryker will see that the project is only reference by one test project.

MischaVreeburg commented 1 year ago

What would you say to the workaround solution, of testing the number of related test projects, if this is one and the build target is full framework, then test is the testproject is a SDK style build. If so pass the testproject twice to the vstest.console process. this would work for this bug.

Code sample: //Test if test projects count == 1 && target only .NET framework => then test if is sdk style var selectedTestProjects = relatedTestProjects.Select(x => x.ProjectFilePath).ToList(); if (relatedTestProjects.Count() == 1 && relatedTestProjects.First().TargetsFullFramework()) { var csProjFile = new XmlDocument(); csProjFile.Load(relatedTestProjects.First().ProjectFilePath); if (csProjFile.SelectSingleNode("/Project[@Sdk]") != null) { selectedTestProjects.AddRange(selectedTestProjects); } } in the method ProjectOrchestrator.MutateSolution

dupdob commented 1 year ago

hi I am currently working on completing 'solution mode' (see PR #2400 ), addressing identified shortcomings. While my PR is still in flux state, I have already addressed several issues (including ones not yet identified). If I understand your first message here, you have a small solution that reproduces the problem. Could you please share it somewhere/somehow ? Alternatively, you can build and use my branch to see if it fixes this problem, bearing in mind it is not in a fully working state as of today.

MischaVreeburg commented 1 year ago

Alas, I don't have a small solution with the problem. Just a 90 project production code solution. Sitecore, custom msbuild targets, targeting .NET framework 48, Sdk style projects, using microsoft.net.sdk.web to target 'old style' asp.net.

I have through careful debugging and observing a hanging process come to this conclusion.

I've tried your branch, it seems to fix the problems but I get a couple of errors during the test discovery process: System.InvalidOperationException: No process is associated with this object. at System.Diagnostics.Process.EnsureState(State state) at System.Diagnostics.Process.get_HasExited() at System.Diagnostics.Process.EnsureState(State state) at System.Diagnostics.Process.get_ExitCode() at Stryker.Core.TestRunners.VsTest.StrykerVsTestHostLauncher.ProcessExited(Object sender, EventArgs e) in C:\git\stryker-net_2\src\Stryker.Core\Stryker.Core\TestRunners\VsTest\StrykerVstestHostLauncher.cs:line 70 at System.Diagnostics.Process.RaiseOnExited() at System.Diagnostics.Process.CompletionCallback(Object waitHandleContext, Boolean wasSignaled) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location --- at System.Threading.PortableThreadPool.CompleteWait(RegisteredWaitHandle handle, Boolean timedOut)

and

private void ProcessExited(object sender, System.EventArgs e) => ErrorCode = _currentProcess.ExitCode; => no currentProcess available

MischaVreeburg commented 1 year ago

System.InvalidOperationException HResult=0x80131509 Message=No process is associated with this object. Source=System.Diagnostics.Process StackTrace: at System.Diagnostics.Process.EnsureState(State state) at System.Diagnostics.Process.get_HasExited() at System.Diagnostics.Process.EnsureState(State state) at System.Diagnostics.Process.get_ExitCode() at Stryker.Core.TestRunners.VsTest.StrykerVsTestHostLauncher.ProcessExited(Object sender, EventArgs e) in C:\git\stryker-net_2\src\Stryker.Core\Stryker.Core\TestRunners\VsTest\StrykerVstestHostLauncher.cs:line 70 at System.Diagnostics.Process.RaiseOnExited() at System.Diagnostics.Process.CompletionCallback(Object waitHandleContext, Boolean wasSignaled) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location --- at System.Threading.PortableThreadPool.CompleteWait(RegisteredWaitHandle handle, Boolean timedOut)

Last lines of runner 3_vsTest-log.host.xxx.txt

TpTrace Verbose: 0 : 30500, 5, 2023/03/20, 15:15:32.597, 229629953750, testhost.net48.exe, BaseRunTests.RunTests: Run is complete. TpTrace Verbose: 0 : 30500, 11, 2023/03/20, 15:15:32.598, 229629967926, testhost.net48.exe, TcpClientExtensions.MessageLoopAsync: NotifyDataAvailable remoteEndPoint: 127.0.0.1:52549 localEndPoint: 127.0.0.1:52550 TpTrace Information: 0 : 30500, 11, 2023/03/20, 15:15:32.598, 229629968969, testhost.net48.exe, TestRequestHandler.OnMessageReceived: received message: (TestSession.Terminate) -> null TpTrace Information: 0 : 30500, 11, 2023/03/20, 15:15:32.598, 229629969274, testhost.net48.exe, Session End message received from server. Closing the connection. TpTrace Information: 0 : 30500, 11, 2023/03/20, 15:15:32.599, 229629973483, testhost.net48.exe, SocketClient.Stop: Stop communication from server endpoint: 127.0.0.1:052549 TpTrace Information: 0 : 30500, 11, 2023/03/20, 15:15:32.599, 229629973922, testhost.net48.exe, SocketClient: Stop: Cancellation requested. Stopping message loop. TpTrace Information: 0 : 30500, 1, 2023/03/20, 15:15:32.599, 229629974917, testhost.net48.exe, SocketClient.Stop: Stop communication from server endpoint: 127.0.0.1:052549 TpTrace Verbose: 0 : 30500, 11, 2023/03/20, 15:15:32.599, 229629975517, testhost.net48.exe, LengthPrefixCommunicationChannel.Dispose: Dispose reader and writer. TpTrace Information: 0 : 30500, 11, 2023/03/20, 15:15:32.599, 229629976015, testhost.net48.exe, Closing the connection ! TpTrace Verbose: 0 : 30500, 11, 2023/03/20, 15:15:32.599, 229629976286, testhost.net48.exe, MulticastDelegateUtilities.SafeInvoke: LengthPrefixCommunicationChannel: MessageReceived: Invoking callback 1/1 for Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler., took 0 ms. TpTrace Information: 0 : 30500, 1, 2023/03/20, 15:15:32.599, 229629975555, testhost.net48.exe, SocketClient: Stop: Cancellation requested. Stopping message loop. TpTrace Verbose: 0 : 30500, 1, 2023/03/20, 15:15:32.599, 229629976867, testhost.net48.exe, LengthPrefixCommunicationChannel.Dispose: Dispose reader and writer. TpTrace Information: 0 : 30500, 11, 2023/03/20, 15:15:32.599, 229629980101, testhost.net48.exe, SocketClient.PrivateStop: Stop communication from server endpoint: 127.0.0.1:052549, error: TpTrace Verbose: 0 : 30500, 11, 2023/03/20, 15:15:32.603, 229630020661, testhost.net48.exe, LengthPrefixCommunicationChannel.Dispose: Dispose reader and writer. TpTrace Verbose: 0 : 30500, 11, 2023/03/20, 15:15:32.603, 229630021302, testhost.net48.exe, TcpClientExtensions.MessageLoopAsync: exiting MessageLoopAsync remoteEndPoint: 127.0.0.1:52549 localEndPoint: 127.0.0.1:52550

dupdob commented 1 year ago

thanks for the report

MischaVreeburg commented 1 year ago

Hope i helps. Plz let me know if I need to test something

rouke-broersma commented 1 year ago

@MischaVreeburg A new version has just been released which includes updates to vstest and improvements to solution runs, could you try it out and let us know if it fixes any if your issues?

MischaVreeburg commented 1 year ago

@rouke-broersma Thanks for the new version. For the 3.7 version, I had added a unittest project with a single unittest, linked to all the assemblies that also had other unittest projects in order to guarantee that vstest would use the testhost.net48.exe process. I haven't yet tested the new version without this project, this is still on my (very long) todo list.

My first indications are:

I haven't tested the since option yet. That is also on the todo list.

stack trace of recent crash 2023-06-06T16:24:08.4336707Z [18:24:08 INF] Time Elapsed 04:34:18.9757859 2023-06-06T16:24:09.5090794Z Unhandled exception. System.AggregateException: One or more errors occurred. (Stryker failed to connect to vstest.console) 2023-06-06T16:24:09.5091187Z ---> 2023-06-06T16:24:09.5091243Z 2023-06-06T16:24:09.5091523Z Stryker failed to connect to vstest.console 2023-06-06T16:24:09.5091604Z 2023-06-06T16:24:09.5091648Z 2023-06-06T16:24:09.5091770Z Inner Exception: 2023-06-06T16:24:09.5092104Z C:\Program Files\dotnet\dotnet.exe process failed to connect to vstest.console process after 90 seconds. This may occur due to machine slowness, please set environment variable VSTEST_CONNECTION_TIMEOUT to increase timeout. 2023-06-06T16:24:09.5093018Z at Microsoft.TestPlatform.VsTestConsole.TranslationLayer.VsTestConsoleWrapper.WaitForConnection() 2023-06-06T16:24:09.5093332Z at Microsoft.TestPlatform.VsTestConsole.TranslationLayer.VsTestConsoleWrapper.EnsureInitialized() 2023-06-06T16:24:09.5093668Z at Microsoft.TestPlatform.VsTestConsole.TranslationLayer.VsTestConsoleWrapper.InitializeExtensions(IEnumerable1 pathToAdditionalExtensions) 2023-06-06T16:24:09.5094113Z at Stryker.Core.TestRunners.VsTest.VsTestContextInformation.BuildVsTestWrapper(String runnerId) in //src/Stryker.Core/Stryker.Core/TestRunners/VsTest/VsTestContextInformation.cs:line 109 2023-06-06T16:24:09.5094326Z 2023-06-06T16:24:09.5094468Z --- End of inner exception stack trace --- 2023-06-06T16:24:09.5094752Z at System.Threading.Tasks.TaskReplicator.Run[TState](ReplicatableUserAction1 action, ParallelOptions options, Boolean stopOnFirstFailure) 2023-06-06T16:24:09.5095558Z at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action1 body, Action2 bodyWithState, Func4 bodyWithLocal, Func1 localInit, Action1 localFinally) 2023-06-06T16:24:09.5096177Z --- End of stack trace from previous location --- 2023-06-06T16:24:09.5096702Z at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException) 2023-06-06T16:24:09.5097500Z at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action1 body, Action2 bodyWithState, Func4 bodyWithLocal, Func1 localInit, Action1 localFinally) 2023-06-06T16:24:09.5098434Z at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IList1 list, ParallelOptions parallelOptions, Action1 body, Action2 bodyWithState, Action3 bodyWithStateAndIndex, Func4 bodyWithStateAndLocal, Func5 bodyWithEverything, Func1 localInit, Action1 localFinally) 2023-06-06T16:24:09.5099491Z at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable1 source, ParallelOptions parallelOptions, Action1 body, Action2 bodyWithState, Action3 bodyWithStateAndIndex, Func4 bodyWithStateAndLocal, Func5 bodyWithEverything, Func1 localInit, Action1 localFinally) 2023-06-06T16:24:09.5101811Z at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable1 source, ParallelOptions parallelOptions, Action1 body) 2023-06-06T16:24:09.5102592Z at Stryker.Core.MutationTest.MutationTestProcess.TestMutants(IEnumerable1 mutantsToTest) in //src/Stryker.Core/Stryker.Core/MutationTest/MutationTestProcess.cs:line 113 2023-06-06T16:24:09.5103379Z at Stryker.Core.MutationTest.MutationTestProcess.Test(IEnumerable1 mutantsToTest) in /_/src/Stryker.Core/Stryker.Core/MutationTest/MutationTestProcess.cs:line 100 2023-06-06T16:24:09.5104168Z at Stryker.Core.StrykerRunner.RunMutationTest(IStrykerInputs inputs, ILoggerFactory loggerFactory, IProjectOrchestrator projectOrchestrator) in /_/src/Stryker.Core/Stryker.Core/StrykerRunner.cs:line 114 2023-06-06T16:24:09.5104852Z at Stryker.CLI.StrykerCli.RunStryker(IStrykerInputs inputs) in /_/src/Stryker.CLI/Stryker.CLI/StrykerCLI.cs:line 93 2023-06-06T16:24:09.5105696Z at Stryker.CLI.StrykerCli.<>c__DisplayClass10_0.<Run>b__0() in /_/src/Stryker.CLI/Stryker.CLI/StrykerCLI.cs:line 68 2023-06-06T16:24:09.5106385Z at McMaster.Extensions.CommandLineUtils.CommandLineApplication.<>c__DisplayClass143_0.<OnExecute>b__0(CancellationToken _) 2023-06-06T16:24:09.5106957Z at McMaster.Extensions.CommandLineUtils.CommandLineApplication.ExecuteAsync(String[] args, CancellationToken cancellationToken) 2023-06-06T16:24:09.5107458Z at McMaster.Extensions.CommandLineUtils.CommandLineApplication.Execute(String[] args) 2023-06-06T16:24:09.5107926Z at Stryker.CLI.StrykerCli.Run(String[] args) in /_/src/Stryker.CLI/Stryker.CLI/StrykerCLI.cs:line 74 2023-06-06T16:24:09.5108403Z at Stryker.CLI.Program.Main(String[] args) in /_/src/Stryker.CLI/Stryker.CLI/Program.cs:line 14 2023-06-06T16:24:19.4011626Z ##[error]Cmd.exe exited with code '-532462766'. 2023-06-06T16:24:19.4789276Z ##[section]Finishing: Mutation testing with Stryker.NET

second crash [00:20:14 DBG] The type or namespace name 'System' could not be found (are you missing a using directive or an assembly reference?), C:\vstsagent\_work\65\s\server\xxx\xxx\code\GlobalSuppressions.cs [00:20:14 DBG] Predefined type 'System.String' is not defined or imported, C:\vstsagent\_work\65\s\server\xxx\xxx\code\GlobalSuppressions.cs [00:20:14 DBG] Predefined type 'System.String' is not defined or imported, C:\vstsagent\_work\65\s\server\xxxx\xxxx\code\GlobalSuppressions.cs [00:20:14 DBG] The type or namespace name 'Justification' could not be found (are you missing a using directive or an assembly reference?), C:\vstsagent\_work\65\s\server\xxx\Caxxxses\code\GlobalSuppressions.cs [00:20:14 DBG] Predefined type 'System.String' is not defined or imported, C:\vstsagent\_work\65\s\server\xxx\xxx\code\GlobalSuppressions.cs

My config: { "stryker-config": { "solution": "./MyBigSolution.sln", "language-version": "Csharp9", "target-framework":"net48", "mutation-level": "Complete", "coverage-analysis": "perTest", "disable-bail": false, "concurrency": 4, "reporters": [ "progress", "html", "json" ], "thresholds": { "high": 80, "low": 60, "break": 0 }, "additional-timeout": 3000, "verbosity": "trace", "ignore-mutations": [ ], "mutate": [ "!**/*GlobalSuppressions.cs" ] } }

dupdob commented 1 year ago

thanks for the report. Regarding 'slowness' for V3.9: solution have been significantly revised in V3.9. One of the key benefits is that Stryker now mutates every non test project in the solution; as such, one can expect V3.9 to take more time as it does more work. it should be visible in the report. Both exceptions you report here are related to the health of the Net build system. I suspect this may be related to ressources exhaustion, or something similar. How many projects are in your solution ? does it deal with a lof of files and/or IP ports ?

MischaVreeburg commented 1 year ago

@dupdob thanks for your answer. I've got around 90 projects in my solution. Not a lot of IP ports, but a lot of files. Most of the projects are web projects.

Also I noticed in the report, that there isn't a breakdown per project/namespace of each file. Just each subfolder for the files in all the projects. This makes reading the report kinda hard. In 3.7 this was still a breakdown per project.

Question: How does the new solution mode go about when using the 'since' option + baseline and only one project with one unittest project has been changed? Does is still run all the tests? Does stryker run the tests against only one unittest project and consequently start just vstest.console.exe for the one unittest project?

dupdob commented 1 year ago

Regardig the report structure, this appears to be an undesired regression. I am busy getting a PR validated, but I will look into this after. Regarding your question: as of now, the solution mode will test all project again, there is no project level optimization. So it will mutate every projects, but only tests the part that have changed. Note that a change to a unitest project will trigger significant testing for project tested (directly or indirectly) by said test project.

rouke-broersma commented 1 year ago

The report structure should be fixed as of 3.9.2

dupdob commented 1 week ago

Several changes regarding solution mode logic have happened in the past year. I assume this is no longer an issue, but feel free to reopen it if you are still facing problems