trailofbits / differ

Detecting Inconsistencies in Feature or Function Evaluations of Requirements
GNU Affero General Public License v3.0
67 stars 4 forks source link

String variables #2

Closed ameily closed 1 year ago

ameily commented 1 year ago

Similar to the IntVariable, create a new StringVariable that generates values based on:

For example, the following variable config should produce the string hello world! and then generate 3 strings that match the regex pattern provided:

templates:
  - variables:
     name:
       type: str
       values:
         - 'hello world!'

       regex:
         pattern: 'hello [a-zA-Z0-9]{1,10}'
         size: 3

Like the IntVariable the values and regex configs can be used separately or together.

Use a library such as exrex to generate strings based on a regex pattern.

ameily commented 1 year ago

@ahussar-developer do you mind taking a look at this? This issue is essentially replicating the string generation you had in your prototype.

You would implement this in a new class of differ/variables/primitives.py. Let me know if you have any questions or want to jump on a quick call to walkthrough the code since there isn't much documentation yet.

ameily commented 1 year ago

Here's the stub I wrote when sharing my screen that you can use as a base if needed. You can also use the IntVariable as an example to work from.

@register('str')
class StringVariable(FuzzVariable):

    def __init__(self, name: str, config: dict):
        super().__init__(name, config)
        # parse the config (values / regex pattern, both are optional).

    def generate_values(self, template: TraceTemplate) -> Iterator[str]:
        if self.values:
            yield from self.values

        if self.regex:
            for i in range(self.regex.count):
                yield self.generate_string()

    def generate_string(self) -> str:
        # use some regex magic to generate a new string
        pass
ahussar-developer commented 1 year ago

@ameily Is it possible to use multiple regex expressions for an argument? And how would this look in the template?

ameily commented 1 year ago

@ahussar-developer If that's something we want, like "generate 5 strings for each of these 2 regex patterns", then you could do it:

- type: str
  regex:
    pattern:
      - '^first pattern$'
      - '^second pattern$'

    count: 3  # generate 3 strings for each pattern (6 strings will be generated in total)

Then, in your Python class I assume you would handle both a single pattern and a list:

def __init__(self, name, config):
    regex = config.get('regex')
    if regex:
        patterns = regex['pattern']
        if not isinstance(patterns, list):
            patterns = [patterns]
        self.patterns = patterns

def generate_values(self, template):
    if self.patterns:
        for pattern in self.patterns:
            for _ in range(self.count):
                yield self.generate_from_pattern(pattern)

We can always add support for multiple patterns later if you want to keep this initial work smaller, it's up to you.