qntm / greenery

Regular expression manipulation library
http://qntm.org/greenery
MIT License
331 stars 40 forks source link

Support for lazy quantifiers #107

Closed michaelmior closed 4 months ago

michaelmior commented 4 months ago

Greenery doesn't seem to support lazy quantifiers, for example o{0,1}?. Currently I run into a parser error.

>>> greenery.parse('o{0,1}?')
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mmior/.local/lib/python3.8/site-packages/greenery/parse.py", line 371, in parse
    raise NoMatch(f"Could not parse {string!r} beyond index {i}")
greenery.parse.NoMatch: Could not parse 'o{0,1}?' beyond index 6
qntm commented 4 months ago

Yes, this is deliberate. A regular expression is a concise description of a regular language, which is a set of strings, and greenery provides tools for manipulating those expressions. However, lazy quantifiers do not alter the regular language. The set of strings matched by /o{0,1}?/ is identical to the set of strings matched by /o{0,1}/ - they describe the same regular language. Lazy quantifiers do affect substring matches, but greenery does not actually provide matching functionality, that isn't what it's for.

michaelmior commented 4 months ago

@qntm Got it, thanks for the response. I guess that means it doesn't support lazy/greedy modifiers at all, which makes sense. Although I will say that it would be very helpful for my use case if the parser would accept such regexes anyway. I'm dealing with a scenario where I'd like to be able to handle as broad a set of expressions as possible.

While of course I could find a way to parse the expression using some other method and remove lazy and greedy modifiers before feeding to greenery, it would be nice to be able to skip this step.

qntm commented 4 months ago

There is a small amount of precedent for allowing syntax in the parser even though it doesn't make a material difference. I will consider this.

michaelmior commented 4 months ago

@qntm Thanks! I appreciate it :)

michaelmior commented 4 months ago

Also, I apologize that I missed that lack of support for lazy modifiers is explicitly mentioned in the README…

qntm commented 4 months ago

Give greenery-4.2.1 a try.

michaelmior commented 4 months ago

@qntm Works great! Thanks :)