Wrong column for invalid string prefix

gvanrossum commented 4 years ago

Example input:

def foo():
    '''

def bar():
    pass

def baz():
    '''quux'''

Expected output:

  File "/Users/guido/v39/lib/python3.9/site-packages/pyflakes/test/t.py", line 8

def bar():
    pass

def baz():
    '''quux'''
    ^
SyntaxError: invalid string prefix

Actual (with new parser):

  File "/Users/guido/v39/lib/python3.9/site-packages/pyflakes/test/t.py", line 8

                                        ^

ammaraskar commented 4 years ago

Seems to be a general problem with errors from the tokenizer.

Before:

>>> a = \!
  File "<stdin>", line 1
    a = \!
         ^
SyntaxError: unexpected character after line continuation character
>>> 😃 = 1
  File "<stdin>", line 1
    😃 = 1
    ^
SyntaxError: invalid character in identifier

After:

>>> a = \!
  File "<stdin>", line 1
SyntaxError: unexpected character after line continuation character
>>> 😃 = 1
  File "<stdin>", line 1
SyntaxError: invalid character in identifier

ammaraskar commented 4 years ago

So the root cause of this is that the offset field in SyntaxError can be either a column offset or an offset from the start of the file. The offset calculation code as taken from here:

https://github.com/we-like-parsers/cpython/blob/14ab84b2181d8f3b30915d569d6fafadfc507f1f/Parser/parsetok.c#L422-L428

is based on the file offset when it does tok->cur - tok->buf, consequently the err_ret->text it uses is the entire file up till where the error occurred. If we want to provide only the line the error occurs on to SyntaxError like we currently do:

https://github.com/we-like-parsers/cpython/blob/14ab84b2181d8f3b30915d569d6fafadfc507f1f/Parser/pegen/pegen.c#L258-L263

then the calculation needs to be adjusted so that it's relative to that line. Alternatively, we can pass the entire file up till the error to the SyntaxError to maintain consistency with what's been done so far.

gvanrossum commented 4 years ago

If the col offset is ever from the start of the file that's a mistake.

isidentical commented 4 years ago

Is there a special reason to use tokenizer_error_with_col_offset? I've made some changes to _PyPegen_raise_error and replaced all usages of tokenizer_error_with_col_offset calls with proper RAISE_SYNTAX_ERROR/RAISE_INDENTATION_ERROR and it works smoothly.

lysnikolaou commented 4 years ago

Is there a special reason to use tokenizer_error_with_col_offset? I've made some changes to _PyPegen_raise_error and replaced all usages of tokenizer_error_with_col_offset calls with proper RAISE_SYNTAX_ERROR/RAISE_INDENTATION_ERROR and it works smoothly.

No particular reason, except that last week everything needed to move fast and so I probably didn't think too hard about alternative ways to do things, actually I just did what seemed easier.

Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.

isidentical commented 4 years ago

Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.

Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the ~~alpha 6~~, beta 1.

lysnikolaou commented 4 years ago

Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.

Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6.

That'd be great!

ammaraskar commented 4 years ago

Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6.

Thanks for working on this! Just a tip, the existing column numbers are generally based off this rule.

https://github.com/we-like-parsers/cpython/blob/14ab84b2181d8f3b30915d569d6fafadfc507f1f/Parser/parsetok.c#L419-L421

lysnikolaou commented 4 years ago

Fixed in python/cpython#19782.

pablogsal commented 4 years ago

Fixed in python#19782.

Not completely, we are still missing the offsets that come from the AST

lysnikolaou commented 4 years ago

Are those related to invalid string prefixes?

pablogsal commented 4 years ago

Are those related to invalid string prefixes?

No, they are just some incorrect column offsets. Are we tracking those in a different issue?

pablogsal commented 4 years ago

Oh, we are: https://github.com/we-like-parsers/cpython/issues/65

Sorry, we can indeed close this issue :)

we-like-parsers / cpython

Wrong column for invalid string prefix #121