Closed gvanrossum closed 4 years ago
Seems to be a general problem with errors from the tokenizer.
Before:
>>> a = \!
File "<stdin>", line 1
a = \!
^
SyntaxError: unexpected character after line continuation character
>>> 😃 = 1
File "<stdin>", line 1
😃 = 1
^
SyntaxError: invalid character in identifier
After:
>>> a = \!
File "<stdin>", line 1
SyntaxError: unexpected character after line continuation character
>>> 😃 = 1
File "<stdin>", line 1
SyntaxError: invalid character in identifier
So the root cause of this is that the offset
field in SyntaxError can be either a column offset or an offset from the start of the file. The offset calculation code as taken from here:
is based on the file offset when it does tok->cur - tok->buf
, consequently the err_ret->text
it uses is the entire file up till where the error occurred. If we want to provide only the line the error occurs on to SyntaxError like we currently do:
then the calculation needs to be adjusted so that it's relative to that line. Alternatively, we can pass the entire file up till the error to the SyntaxError to maintain consistency with what's been done so far.
If the col offset is ever from the start of the file that's a mistake.
Is there a special reason to use tokenizer_error_with_col_offset
? I've made some changes to _PyPegen_raise_error
and replaced all usages of tokenizer_error_with_col_offset
calls with proper RAISE_SYNTAX_ERROR
/RAISE_INDENTATION_ERROR
and it works smoothly.
Is there a special reason to use
tokenizer_error_with_col_offset
? I've made some changes to_PyPegen_raise_error
and replaced all usages oftokenizer_error_with_col_offset
calls with properRAISE_SYNTAX_ERROR
/RAISE_INDENTATION_ERROR
and it works smoothly.
No particular reason, except that last week everything needed to move fast and so I probably didn't think too hard about alternative ways to do things, actually I just did what seemed easier.
Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.
Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.
Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6, beta 1.
Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.
Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6.
That'd be great!
Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6.
Thanks for working on this! Just a tip, the existing column numbers are generally based off this rule.
Fixed in python/cpython#19782.
Fixed in python#19782.
Not completely, we are still missing the offsets that come from the AST
Are those related to invalid string prefixes?
Are those related to invalid string prefixes?
No, they are just some incorrect column offsets. Are we tracking those in a different issue?
Oh, we are: https://github.com/we-like-parsers/cpython/issues/65
Sorry, we can indeed close this issue :)
Example input:
Expected output:
Actual (with new parser):