qntm / greenery

Regular expression manipulation library
http://qntm.org/greenery
MIT License
331 stars 40 forks source link

fix incorrect combination of open charclass predicates #69

Closed rwe closed 1 year ago

rwe commented 1 year ago

Open/negated charclass predicates like \W and \S were combined incorrectly.

They were interpreted as as not (whitespace or digits) but their correct interpretation is (not whitespace) or (not digits).

This fixes the misinterpretation and the test. It includes a couple minor nits that made it easier to rebase the fix.

Although I didn't include as a test in this PR because there are plausible reasons that the test suite doesn't already do this, I verified the behaviour against the built-in re module with:

import re
from greenery import parse

def test_expr(expr, value):
    pat = parse(expr)
    rx = re.compile(expr)
    assert bool(rx.match(value)) == bool(pat.matches(value))

for expr in (r"\S\D", r"1\D", r"1\D\S"):
    for negation in (True, False):
        for value in "12x ":
            test_expr(f'[{"^" if negation else ""}{expr}]', value)

I also confirmed consistency with some manual tinkering in javascript.