stryker-mutator / stryker-net

Mutation testing for .NET core and .NET framework!
https://stryker-mutator.io
Apache License 2.0
1.75k stars 176 forks source link

Concurrency issue with hashset on MutantPlacer #2899

Closed f-bourqui closed 2 months ago

f-bourqui commented 3 months ago

Describe the bug A concurrent update was performed on this collection and corrupted its state Since 4.0.3, it throw an exception, while no issue with 4.0.0 (4.0.1 and 4.0.2 was not working because of #2881). It seems that some parallel stuff was added/modified in 4.0.3 without looking if it was thread safe.

Logs

System.InvalidOperationException: Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
   at System.Collections.Generic.HashSet`1.AddIfNotPresent(T value, Int32& location)
   at System.Collections.Generic.HashSet`1.Add(T item)
   at Stryker.Core.Mutants.MutantPlacer.RegisterEngine(IInstrumentCode engine, Boolean requireRecursive) in /_/src/Stryker.Core/Stryker.Core/Mutants/MutantPlacer.cs:line 53
   at Stryker.Core.Mutants.CsharpNodeOrchestrators.BaseFunctionOrchestrator`1..ctor() in /_/src/Stryker.Core/Stryker.Core/Mutants/CsharpNodeOrchestrators/BaseFunctionOrchestrator.cs:line 19
   at Stryker.Core.Mutants.CsharpNodeOrchestrators.BaseMethodDeclarationOrchestrator`1..ctor()
   at Stryker.Core.Mutants.CsharpMutantOrchestrator.BuildOrchestratorList() in /_/src/Stryker.Core/Stryker.Core/Mutants/CsharpMutantOrchestrator.cs:line 39
   at Stryker.Core.Mutants.CsharpMutantOrchestrator..ctor(MutantPlacer placer, IEnumerable`1 mutators, StrykerOptions options) in /_/src/Stryker.Core/Stryker.Core/Mutants/CsharpMutantOrchestrator.cs:line 35
   at Stryker.Core.MutationTest.CsharpMutationProcess.Mutate(MutationTestInput input) in /_/src/Stryker.Core/Stryker.Core/MutationTest/CsharpMutationProcess.cs:line 57
   at Stryker.Core.MutationTest.MutationTestProcess.Mutate() in /_/src/Stryker.Core/Stryker.Core/MutationTest/MutationTestProcess.cs:line 88
   at Stryker.Core.Initialisation.ProjectMutator.MutateProject(StrykerOptions options, MutationTestInput input, IReporter reporters) in /_/src/Stryker.Core/Stryker.Core/Initialisation/ProjectMutator.cs:line 38
   at Stryker.Core.Initialisation.ProjectOrchestrator.<>c__DisplayClass7_0.<MutateProjects>b__0(MutationTestInput mutationTestInput) in /_/src/Stryker.Core/Stryker.Core/Initialisation/ProjectOrchestrator.cs:line 70
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`2.<ForWorker>b__1(RangeWorker& currentWorker, Int64 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`2.<ForWorker>b__1(RangeWorker& currentWorker, Int64 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()

Expected behavior Should not throw the exception.

Desktop:

Additional context It happens right after building and discovering the tests. [09:21:05 INF] Number of tests found: 1 for project C:\Dev\AnySolution\AnyProject.csproj. Initial test run started. [09:21:27 ERR] An error occurred during the mutation test run

f-bourqui commented 3 months ago

Seems to be since commit 0ab0ab34

rouke-broersma commented 3 months ago

@richardwerkman

richardwerkman commented 3 months ago

@f-bourqui This should be fixed in version 4.0.4 of stryker.

f-bourqui commented 2 months ago

@richardwerkman Nope problem is still present. Have a better look at the stack trace I provided.

It's a HashSet RequireRecursiveRemoval called by RegisterEngine in MutantPlacer class (title of the issue). If it's intended to call RegisterEngine concurrently, a simple lock surrounding access of the HashSet (add and contains) should solve the issue.

InriverMilleBo commented 2 months ago

Yeah, this is still an issue for us in 4.0.4 as well. We're trying to introduce mutation testing to our organization but right now half or more of our runs fail because of this issue and it makes it hard to enable anywhere (as you manually have to trigger re-runs).

So would be really grateful for a fix on it.

richardwerkman commented 2 months ago

I'm working on a fix right now! It seems I missed that we have some static elements in our mutation process. These are shared between the parallel mutation threads and seem to cause the issue. I'm turning them into non-static elements to resolve this.

Sorry for the inconvenience.

InriverMilleBo commented 2 months ago

I'm working on a fix right now! It seems I missed that we have some static elements in our mutation process. These are shared between the parallel mutation threads and seem to cause the issue. I'm turning them into non-static elements to resolve this.

Sorry for the inconvenience.

No worry and happy that you're working on it!

dupdob commented 2 months ago

I opened PR #2914 to illustrate how simple is that approach. Important side note (independent from implementation strategy): going full parallel for mutation generation makes mutation IDs random (at least, very unstable). I feel this is a regression, and we should devise a stable numeration logic: mutation IDs allows conversation around specific mutations within a team, such as 'why is mutation #123 not covered by any test?'.

f-bourqui commented 2 months ago

It seems that 4.0.5 solved the issue for me.