neogeny / pygl

Python Grammar Language
GNU General Public License v3.0
5 stars 0 forks source link

Parse f-string components separately #2

Open apalala opened 5 years ago

apalala commented 5 years ago

Currently the parser retrieves f-strings as a single lexical element.

apalala commented 5 years ago

Worth looking at the rules in the reference documentation.

flying-sheep commented 5 years ago

Hi! That’s my jam (though due to dissertation writing priorites not necessarily something I can invest time in for the next 8 months). My arguments why this is a good idea:

  1. Better code design. We don’t have to postprocess “strings” and loop back to the tokenizer when we find out that we have an f-string which needs its expression parts tokenized.
  2. Consistent language grammar. There’s no reason why r'\' should be a syntax error, it’s a basically a limitation of abovementioned design that sneaked its way into the grammar.
  3. Recursion. PEP 536 proposes that f-strings should be nestable. There’s no other place in Python where we’re limited to two nesting levels (f'Outer! {f"Inner {no_quotes_here}" if p else ""} !')

Guido seems to be open to this idea, even though he initially said he didn’t want to do this. A recap:

  1. Guido wrote Building a PEG Parser, in which he said:

    Tokenizing Python is complicated enough that I don’t want to reimplement it using PEG’s formalism. […] I have no beef with Python’s existing tokenizer, so I want to keep it.

  2. I replied that the different kinds of strings can specifically benefit from an update to the tokenizer (I assume you agreeing with me here is why you created this issue, @apalala :smile:)

  3. Guido replied that Medium is not good for conversation (agreed!) and directed me to discuss.python.org, where I found your thread, which led me here.

gvanrossum commented 5 years ago

FWIW, just because I redirected the conversation doesn’t mean I agree with you. In particular I disagree with points 2 and 3. -- --Guido (mobile)

flying-sheep commented 5 years ago

I didn’t say you’d agreed, just that you seem to be open to discussing it as opposed to shutting it down outright.

But in which way do you disagree with “There’s no reason why r'\' should be a syntax error”??

gvanrossum commented 5 years ago

It would prevent regular expressions containing quotes. -- --Guido (mobile)

flying-sheep commented 5 years ago

Huh? I’m talking about making raw strings able to end on an odd number of backslashes. I don’t get how that can possibly prevent anything. It’s nothing but the elimination of an inconsistent corner case of the grammar, therefore making Python slightly less surprising.

gvanrossum commented 5 years ago

Well, how would you write a raw string with a quote in it? (Not using the ‘other’ kind of quote.) -- --Guido (mobile)

flying-sheep commented 5 years ago

I’d much prefer being able to do r'\' and having to use triple quotes for things like r'''^["'\]+$''' than having an exception built into the grammar just to enable the sequence “backslash-quote” in raw strings. After all, special cases aren't special enough to break the rules :wink:

Had I started coding >11 years ago instead of 10, I’d have had time to argue for getting rid of this behavior in Python 3000, but that ship has sailed I assume.

gvanrossum commented 5 years ago

Let’s agree to disagree. -- --Guido (mobile)