we-like-parsers / pegen_experiments

Experiments for the official PEG parser generator for Python
https://github.com/python/cpython/tree/master/Tools/peg_generator
Other
275 stars 29 forks source link

Correctly shift node position in multi-line fstrings #207

Closed lysnikolaou closed 4 years ago

lysnikolaou commented 4 years ago

I mark this as WIP, because I'm not confident enough that this covers all cases. Any ideas for more tests?

Fixes #203.

gvanrossum commented 4 years ago

I hope Pablo can review this... I'm not at all confident I understand the f-string code.

There's one thing that concerns me though -- why focus all attention on the column offset? Isn't the line number also subject to movement? E.g. in

f"""blah
{
expr)
}
"""

Don't we want the error to point to the ) rather than to the start of the f-string?

lysnikolaou commented 4 years ago

I hope Pablo can review this... I'm not at all confident I understand the f-string code.

There's one thing that concerns me though -- why focus all attention on the column offset? Isn't the line number also subject to movement? E.g. in

f"""blah
{
expr)
}
"""

Don't we want the error to point to the ) rather than to the start of the f-string?

That's pretty much impossible at the moment. All the shifting around only happens after the complete AST tree has been generated. Thus, if the parsing fails, these function don't even get called.

I think, that our best shot at getting this 100% right is rewriting the whole thing, which would be quite a big project. In my imagination, we could have a grammar for f-strings and in concatenate_strings we would just concatenate the string literals together and then call the f-string parser on the big concatenated string if needed. In case we decide this is important, I could slowly start trying things out in that direction.

Thoughts, @gvanrossum @pablogsal?

lysnikolaou commented 4 years ago

why focus all attention on the column offset? Isn't the line number also subject to movement?

Because it seems that line number movement is ok, at least in comparison to what CPython does.

pablogsal commented 4 years ago

That's pretty much impossible at the moment. All the shifting around only happens after the complete AST tree has been generated. Thus, if the parsing fails, these function don't even get called.

Check out the WIP in: https://github.com/gvanrossum/pegen/pull/208. I have tried to provide better error messages for f-strings there. Even if we don't merge it in its entirety, that PR fixes some errors when parsing strings and not files (the filename is <string>) that have syntax errors. This PR still needs some work to handle multi-line f-strings because now it does not point correctly to the line inside the f-string but it may be interesting.

pablogsal commented 4 years ago

I quickly checked the PR and the SHIFT_EXPR and SHIFT_ARG looks :+1: under a preliminary inspection but I need to certainly dedicate more time to understand what's going on here:

https://github.com/gvanrossum/pegen/pull/207/files#diff-e2161640949755a51e055a22cf4af0fdR489-R505

Could you maybe add some comments there for context meanwhile?

gvanrossum commented 4 years ago

I think we should strive for parity with CPython first.

I agree that in the long run we may have to rewrite f-string parsing from scratch -- and Eric V Smith seems to agree (see https://bugs.python.org/issue39564), but that's going to require changes to the tokenizer. If this gets done in 3.9 we'll have to track that, but my money is on 3.10 or later, given the complexity of the edge cases (things like ...{x:4d}... and ...{x = :.5g}... -- and note that currently the unparenthesized walrus doesn't work, e.g. ...{x := 1}...).

pablogsal commented 4 years ago

I think we should strive for parity with CPython first.

Good point.

but that's going to require changes to the tokenizer

I think it will also likely require us to maintain a stack of tokenizers (so you can alternate between parsing f-strings and "regular Python").

lysnikolaou commented 4 years ago

I quickly checked the PR and the SHIFT_EXPR and SHIFT_ARG looks 👍 under a preliminary inspection but I need to certainly dedicate more time to understand what's going on here:

https://github.com/gvanrossum/pegen/pull/207/files#diff-e2161640949755a51e055a22cf4af0fdR489-R505

Could you maybe add some comments there for context meanwhile?

Done.

lysnikolaou commented 4 years ago

Thanks a lot for the review @pablogsal!