qntm / greenery

Regular expression manipulation library
http://qntm.org/greenery
MIT License
311 stars 40 forks source link

Cannot parse `[.-]`. #96

Closed thomasahle closed 1 year ago

thomasahle commented 1 year ago

As far as I know, in regular expressions, when the hyphen (-) is used inside a character class ([...]), and it is last character within the class, it is treated as a literal hyphen.

However, greenery fails on this regex:

    def parse(string: str):
        '''
            Parse a full string and return a `Pattern` object. Fail if
            the whole string wasn't parsed
        '''
        obj, i = match_pattern(string, 0)
        if i != len(string):
>           raise Exception(
                f"Could not parse '{string}' beyond index {str(i)}"
            )
E           Exception: Could not parse '[.-]' beyond index 0

The standard re library works fine:

>>> re.compile('[.-]')
re.compile(r'[.-]', re.UNICODE)

If I escape the dash, like [.\-], greenery accepts it, but it shouldn't be necessary.

qntm commented 1 year ago

This is intentional, to make parsing easier. It is documented in the README. I would probably look favourably on a PR which added this functionality but it's not something I'm planning to add currently.

thomasahle commented 1 year ago

You could leave it open as "enchantment" or "help wanted" if you wanted anyone to make a pull request in the future...