Closed MichaelSquires closed 7 months ago
I made a PR with a fix, although I would suggest not relying on automatically generated priories. Just adding an explicit priority would also fix this for your grammar.
Wow, thanks for the quick turnaround!
We'll look into setting explicit priorities too, thanks for the suggestion 😄 .
This issue is resolved with #1377 being merged in, I'll go ahead and close it. Thank you for your help/support!
@erezsh @MegaIng Any estimates on when we might see a release with this fix?
@MichaelSquires I'm currently on vacation, so no promises, but I'll try to do it next week.
@MichaelSquires I'm currently on vacation, so no promises, but I'll try to do it next week.
Thanks for letting me know. Enjoy your vacation!
Hi @erezsh, I hope you enjoyed the holidays! Wondering if you think we'll see a new release with this bugfix soon?
@MichaelSquires Thanks for the reminder. I'll try to do it this Monday.
Released version 1.1.9
Sorry for the delay!
Excellent news! Thank you for your support!
Describe the bug
This CPython change recently got released in CPython 3.11.7: https://github.com/python/cpython/pull/109859. You'll see in
Lib/re/_parser.py
thatMAXWIDTH
(1 << 64
) is now the maximum width of a regex pattern, notMAXREPEAT
(2**32 - 1
). Lark usesMAXREPEAT
(utils.py#L154) to assume the maximum width when it can't use there
module to calculate the width. This was all fine until CPython 3.11.7 was released earlier today and it started breaking our custom grammar.Our specific issue is that terminal ordering changed because the lengths of the regex patterns are used to determine priority. Since 3.11.7, regex patterns that are only compatible with the
regex
module have a much shorter length thanre
module compatible patterns.This issue (or very similar to it) was discussed just a few weeks ago on github: https://github.com/mrabarnett/mrab-regex/issues/513.
To Reproduce
Here's a minimized test. In this example,
NUM
uses a named capture group without the "P" ((?P<NUMVAL>\d)
) which is supported by theregex
module only. Because it's not supported by the builtinre
module, it's length is assumed to be (min, max) of(1, MAXREPEAT)
where MAXREPEAT =2^^32 -1
. ForSTR
, it isre
module compatible so theget_width()
call succeeds and its length is calculated as(1, MAXWIDTH)
where MAXWIDTH =1 << 64
.Please let me know if you have any questions about this.
Thanks!