sixty-north / cosmic-ray

Mutation testing for Python
MIT License
556 stars 54 forks source link

Is it possible to define a mutation operator that takes arguments? #528

Open AndrewC19 opened 2 years ago

AndrewC19 commented 2 years ago

I would like to define a custom mutation operator that looks for particular variables in the program-under-test and replaces them with random constants. For example, I might want to replace all uses of a variable named x1 with some randomly sampled int.

Is there a straightforward way to achieve this using cosmic-ray? I have tried defining my own Operator sub-class but I need a way to pass the target variable to the mutation operator.

Thanks in advance.

abingham commented 2 years ago

There is not currently a way to do that. I guess broadly speaking you'd want to be able to do some per-operator configuration in the configuration file and have it take effect when you exec your session. This seems totally reasonable. A few things come to mind which we'd need to think about.

  1. How will this argument information get communicated through distributors? Right now, the WorkItems that we send over the distributor don't have configuration information in them, so the workers - which could be running anywhere - don't know about it. We could send the complete configuration over to each worker somehow, or - more directly addressing your issue, and I think my preference - each WorkItem could also include some kind of "arguments" struct that the operator could interpret as it sees fit. I don't think this would be too hard, really.

  2. Do we need to support multiple sets of arguments for any given operators? That is, suppose you wanted to replace all variables named x1 and you wanted to replace all variables named x2? Do operators now become templates (in the C++ sense, sorta) where each set of arguments actually generates a new independent operator? Something like this could possibly be handled by our OperatorProvider system, but the providers would probably need to be handed the configuration. (and perhaps I'm just overthinking this).

There are probably other angles to this that I'm not seeing right now. Does any of this sound reasonable to you, or along the lines of what you had in mind? I don't know when I'd be able to work on this, but it feels like a good "airplane project"...I just need a long trip somewhere!

AndrewC19 commented 2 years ago

An example might help with this decision.

I have implemented a VariableReplacement operator that can carry out two types of mutation: (1) Replace all usages of a named variable with a constant (e.g. replace x in y=2*x+1 --> y=2*10+1). (2) Replace usages of a named variable in the declaration of a second named variable (e.g. replacing x in statements of y, such that y=2x+1 --> y=2*10+1 but j=2x+1 does not change).

Here's the implementation:

"""Implementation of the variable-replacement operator."""
from .operator import Operator
from parso.python.tree import Name, Number
from random import randint

class VariableReplacer(Operator):
    """An operator that replaces usages of named variables."""

    def __init__(self, cause_variable, effect_variable=None):
        self.cause_variable = cause_variable
        self.effect_variable = effect_variable

    def mutation_positions(self, node):
        """Mutate usages of the specified cause variable. If an effect variable is also
        specified, then only mutate usages of the cause variable in definitions of the
        effect variable."""

        if isinstance(node, Name) and node.value == self.cause_variable:

            # Confirm that name node is used on right hand side of the expression
            expr_node = node.search_ancestor('expr_stmt')
            if expr_node:
                cause_variables = expr_node.get_rhs().children
                if node in cause_variables:
                    mutation_position = (node.start_pos, node.end_pos)

                    # If an effect variable is specified, confirm that it appears on left hand
                    # side of the expression
                    if self.effect_variable:
                        effect_variable_names = [v.value for v in expr_node.get_defined_names()]
                        if self.effect_variable in effect_variable_names:
                            yield mutation_position

                    # If no effect variable is specified, any occurrence of the cause variable
                    # on the right hand side of an expression can be mutated
                    else:
                        yield mutation_position

    def mutate(self, node, index):
        """Replace cause variable with random constant."""
        assert isinstance(node, Name)

        return Number(start_pos=node.start_pos, value=str(randint(-100, 100)))

    @classmethod
    def examples(cls):
        return (
            # for cause_variable='x'
            ('y = x + z', 'y = 10 + z'),
            # for cause_variable='x' and effect_variable='y'
            ('j = x + z\ny = x + z', 'j = x + z\ny = -2 + z'),
            # for cause_variable='x' and effect_variable='j',
            ('j = x + z\ny = x + z', 'j = 1 + z\ny = x + z'),
            # for cause_variable='x'
            ('y = 2*x + 10 + j + x**2', 'y=2*10 + 10 + j + -4**2'),
        )

The class works if I manually modify src/cosmic_ray/commands/init.py and src/cosmic_ray/mutating.py to instantiate the operator with arguments, and creates mutations such as:

# cause_variable='x'
# effect_variable='y'
--- mutation diff ---
--- acalculator.py
+++ bcalculator.py
@@ -1,5 +1,5 @@
 def mul(x, z):
     j = x * z
-    y = x * j
+    y =28 * j
     return y

Now we need a way to pass two variables to instances of VariableReplacement. From a user perspective, I would suggest that this should be handled in the TOML config file under a table [cosmic-ray.operators]. This should enable me to specify a list of operators that I want to apply and, if they require arguments, to specify these. I imagine something like the following:

[[cosmic-ray.operators]]
name = "variable_replacer"
args = [{ cause_variable = "x", effect_variable = "y"},
        { cause_variable = "x", effect_variable = "j"}]

[[cosmic-ray.operators]]
name = "number_replacer"

Then, for every unique set of arguments defined for an operator, a WorkItem should be created that initalises and applies the mutation. If this table isn't specified, it would make sense to use the current behaviour (run all mutation operators that can be applied to the program).

What are your thoughts?