Rework mutation orchestrator for Null-conditional operators mutator - Githubissues

stryker-mutator / stryker-net

Mutation testing for .NET core and .NET framework!

https://stryker-mutator.io

Apache License 2.0

1.76k stars 175 forks source link

Rework mutation orchestrator for Null-conditional operators mutator #2740

Closed richardwerkman closed 9 months ago

richardwerkman commented 10 months ago

Is your feature request related to a problem? Please describe. After working for multiple hacktoberfest hackathons on #1031 we have to conclude the mutator can't (or shouldn't) be implemented in its current form. The issue is as follows:

Mutation engines currently call all mutators for every node in a syntax tree, but can't place the mutations on all of these nodes. This causes issues in the null coalecing mutator and the linq mutator. In the linq mutator this was solved by verifying inside the mutator that the current node is a top level node (and thus can be placed by the engine) and skips mutating all other nodes. So the mutator itself needs some kind of recursive loop to mutate all instances of the mutation in the node. For the linq mutator this works, it is however already very complex. For the null coalecing mutator the complexity is much higher. We've had some very smart people break their brains on this and they all concluded it's too complex to solve inside the mutator.

Describe the solution you'd like Instead of solving this inside the mutator. We should solve this inside the engine/orchestrator. The orchestrator should always be able to place mutations, regardless of where they are created (top level or deeper inside an expression). We can do this by checking the parent of the node until we find a node that supports placing a mutation and then replace the mutated node inside that node.

Describe alternatives you've considered We could continue working on the current solution but I believe this is a dead end.

Additional context This change would result in much simpler mutators. But it would add a bit of complexity to the current orchestrator logic (which is already complex).

rouke-broersma commented 10 months ago

We should check with strykerjs and stryker4s. They have implemented a slightly different orchestration method that helps with this. Iirc while walking the syntaxtree they mark all appropriate nodes as 'mutant placeable' and once they're ready to place a mutant they simply walk back the tree until the first appropriate 'mutant placeable' node.

dupdob commented 10 months ago

The orchestrator should always be able to place mutations, regardless of where they are created (top level or deeper inside an expression)

That is exactly what I tried to achieve with the current design. 1) BlockSyntax Level 2) Statement Level 3) Expression Level

As a quick tour of the current design: It works thanks to 3 components:

mutators: they are responsible to provide mutations for any syntax node they support
mutation context: this is an aggregate in charge of tracking mutation generations while walking the syntax tree. It stores:
1. mutations pending for injection: mutations that needs to be injected at some higher level construct (see below)
2. list of active mutators driven by initial configuration and Stryker comments
3. other contextual information, such as if we are in some static construct
4. it exposes mutation injection method
Node specific orchestrators. They are in charge of (these behaviors can be overridden for any node type)
1. Adjusting the context if necessary and restoring it afterward. By default parse Stryker comments and adjust the context if necessary.
2. Orchestrate mutation for children node. Default is to enumerate (direct) children and replace them by their mutated versioon
3. Generate mutations. Call every active mutators by default. Note that no existing ochestrator alters this.
4. Store generated mutations in the appropriate queue (note that expression level mutation are injected directly). By default does nothing

dupdob commented 10 months ago

when one wants or needs to create/change an existing orchestrator, it is advised to analyze one similar to the desired one. It is also important to be familiar with Roslyn syntax tree manipulation methods and the impact of having immutable syntax trees. Any mistake may lead to the loss of mutant (because they are never injected or because the mutated node has been lost in some operation) or issues during compilation/rollback phase, due to the loss or corruption of annotations.

dupdob commented 10 months ago

The new design also aimed at being an in place replacement for the previous logic with limited impact to existing mutations. The number of orchestrators have increased as time passed to improve construct for specific construct (such as post/pre unitary operations or local functions). They also evolved to cover more sophisticated constructs, such as the ability to convert expression bodied function to statement bodied one so they can be mutated.

dupdob commented 10 months ago

I had a hard look at the Linq mutator and indeed, it does not fit well with the current orchestrator design, as it does a lot of analysis and walk the syntax tree on its on.

By the design philosophy, this is a sure sign there is one or more orchestrator(s) missing. From a C# perspective, one cannot use ternary operator within one of those to deal with identifier swapping, such as collection(MutantAction?.All(x => true):.Any(x => true)); Would be great, but no dice.

It may also be a matter of how to inject expression mutation: the current logic is that expression mutations are injected on the spot, as they are the tiniest bit of syntax we (generally) mutate. This is not true for identifier swapping.

dupdob commented 10 months ago

The more I think about it, the more I think this is simply a problem for MemberAccess/Binding syntax constructs, which are not classical expressions. They act more like constants: they must be known at compile time. Hence no mutation switching technique is available for them, except at the higher level: we can switch between constants, but not within its definitions.

I think it could be achieved via a MemberAccess orchestrator and some logic inside for the general case, and then see how it must be adjusted for conditional access and null coalescing or this these require dedicated orchestrator as well.

This(ese) orchestrator(s) will be responsible to identify the top level member access expression and ensure mutation injection is properly delayed until the whole sub expression has been processed. This also imply adding some logic to delay mutation injection in expression: this is currently done synchrously by the base expression syntax orchestrator.

One last thing to bear in mind: for performance reason, orchestrators are stateless, the state is managed via mutation context.

Rationale: stateful orchestrator would be more convenient, but it would requires creating one for each syntax node, which would result in large and useless memory consumption, while most syntax node do not require special handling.

dupdob commented 9 months ago

Update: I added the notion of sub expressions (naming to be confirmed) to describe expressions where we cannot inject a ternary operator to control mutations. These mutations need to be controlled at a higher level, i.e. the expression enclosing the sub expression(s), such as an invocation expression. This allowed me to refactor the Linq Mutator to the core function, without any for checks or syntax tree walking.

        public override IEnumerable<Mutation> ApplyMutations(ExpressionSyntax node, SemanticModel semanticModel)
        {

            string memberName;
            SyntaxNode toReplace;
            switch (node)
            {
                case MemberAccessExpressionSyntax memberAccessExpression:
                    toReplace = memberAccessExpression.Name;
                    memberName = memberAccessExpression.Name.Identifier.ValueText;
                    break;
                case MemberBindingExpressionSyntax memberBindingExpression:
                    toReplace = memberBindingExpression.Name;
                    memberName = memberBindingExpression.Name.Identifier.ValueText;
                    break;
                default:
                    yield break;
            }

            if (Enum.TryParse(memberName, out LinqExpression expression) &&
                KindsToMutate.TryGetValue(expression, out var replacementExpression))
            {
                yield return new Mutation
                {
                    DisplayName =
                        $"Linq method mutation ({memberName}() to {SyntaxFactory.IdentifierName(replacementExpression.ToString())}())",
                    OriginalNode = node,
                    ReplacementNode = node.ReplaceNode(toReplace,
                        SyntaxFactory.IdentifierName(replacementExpression.ToString())),
                    Type = Mutator.Linq
                };
            }
        }

Only 5 tests turns red with this logic, but these tests are revealing of a long standing oversimplification of the current (MutationContext) design.

dupdob commented 9 months ago

The simplifying assumption is that the code can be described as a three level strategy: block, statement and expression. This is an oversimplification because it does track properly the fact that (in C# at least), method and functions/lambda can be declared within a method, or even within an expression. This is an issue for some edge case, mainly leading to the silent loss of some mutations in lambda expressions and such. But, now that I have added sub expressions, this is more of a problem because mutation can leak from a sub expression (the member access part of an invocation and the lambda expression parameters/body....

Long story short, Mutation Context has to change to capture this recursion, and this is not hard.

Once I have a working version with the updated MutationContext and LinqMutator classes, I will open a PR for further discussion.

One likely topic is that there is a lot of node specific orchestrators now and in the short term future. Reducing this number is a good acis of improvments. Note that this is a direct consequences of the existing Roslyn class hierarchy (for syntax node) which lacks consistency.

dupdob commented 9 months ago

I opened draft PR #2747 to share progress.

dupdob commented 9 months ago

The PR is still in draft because some refactoring is still due and there is a still a test failing. But it is close to being ready, so one can have a look to discover the changes. PS: the failing test appears to be caused by the need design resulting in more compact code, but careful review remains to be done to reach a definitive conclusion Update: my analysis confirms the previous assumptions. I am doing end to end tests now. Note that the sample codes used in some (already existing) mutation unit tests would not compile BEFORE being mutated, i.e. they are not fully legal C#: correct from a syntax perspective, but not from a semantic one.

richardwerkman commented 9 months ago

Closed by #2747