we-like-parsers / cpython

Here we work on integrating pegen into CPython; use branch 'pegen'
https://github.com/gvanrossum/pegen
Other
1 stars 0 forks source link

Incorrect parsing of multi-line fstrings #172

Closed pablogsal closed 2 years ago

pablogsal commented 2 years ago

Parsing the Tools/c-analyzer/c_parser/parser/_regexes.py fails with:

  File "/home/pablogsal/github/python/f-string-grammar/Tools/c-analyzer/c_parser/parser/_regexes.py", line 28
    ['] [^'] [']
               ^
SyntaxError: closing parenthesis ']' does not match opening parenthesis '(' on line 24
isidentical commented 2 years ago

Seems like we are wrongly closing the f-string on the single quote, even though we opened it with triple quotes;

 $ ./python testing.py t.py                                                                                    20ms
TokenInfo(type=1 (NAME), string='STRING_LITERAL', start=(1, 0), end=(1, 14), line="STRING_LITERAL = textwrap.dedent(rf'''\n")
TokenInfo(type=22 (EQUAL), string='=', start=(1, 15), end=(1, 16), line="STRING_LITERAL = textwrap.dedent(rf'''\n")
TokenInfo(type=1 (NAME), string='textwrap', start=(1, 17), end=(1, 25), line="STRING_LITERAL = textwrap.dedent(rf'''\n")
TokenInfo(type=23 (DOT), string='.', start=(1, 25), end=(1, 26), line="STRING_LITERAL = textwrap.dedent(rf'''\n")
TokenInfo(type=1 (NAME), string='dedent', start=(1, 26), end=(1, 32), line="STRING_LITERAL = textwrap.dedent(rf'''\n")
TokenInfo(type=7 (LPAR), string='(', start=(1, 32), end=(1, 33), line="STRING_LITERAL = textwrap.dedent(rf'''\n")
TokenInfo(type=61 (FSTRING_START), string="rf'''", start=(1, 33), end=(1, 38), line="STRING_LITERAL = textwrap.dedent(rf'''\n")
TokenInfo(type=63 (FSTRING_END), string="\n    (?:\n        # character literal\n        (?:\n            ['] [^']", start=(5, -1), end=(5, 20), line="STRING_LITERAL = textwrap.dedent(rf'''\n    (?:\n        # character literal\n        (?:\n            ['] [^'] [']\n")
Traceback (most recent call last):
  File "/home/isidentical/projects/cpython/testing.py", line 20, in <module>
    for token in tokens:
    ^^^^^^^^^^^^^^^^^^^^
  File "/home/isidentical/projects/cpython/testing.py", line 7, in _tokenize
    for t in tok:
    ^^^^^^^^^^^^^
  File "<string>", line 5
    ['] [^'] [']
               ^
SyntaxError: closing parenthesis ']' does not match opening parenthesis '(' on line 1
isidentical commented 2 years ago

Ah, we do not check if the quotes are consecutive characters, so it is recognizing the stuff in front of it (['] [^'] ['] looks like '''). https://github.com/we-like-parsers/cpython/blob/9e26cdca063de36bbe17943dca465dc803d09498/Parser/tokenizer.c#L2343-L2345