leanprover / lean4

Lean 4 programming language and theorem prover
https://lean-lang.org
Apache License 2.0
4.69k stars 421 forks source link

String interpolation by default #407

Open leodemoura opened 3 years ago

leodemoura commented 3 years ago

The notation "..." would now behave like s!"...", and we would have a notation for raw strings.

leodemoura commented 3 years ago

@Kha you were more active in the Zulip thread about this issue. Do you have further remarks, details, etc? BTW, what would be the notation for raw strings? Would we still have escaped characters for raw strings?

Kha commented 3 years ago

I don't think there was any consensus on raw strings, but personally my favorite syntax is Rust's where you can just add more #s if you still have a collision with the string contents: https://doc.rust-lang.org/reference/tokens.html#characters-and-strings. Not trivial to implement though. I think raw strings should also turn off escapes. I would hope that there is not sufficient need for a syntax that allows escapes but no interpolation.

DanielFabian commented 3 years ago

What makes the rust style raw strings more difficult to implement? It's not a regular grammar, but any grammar that can handle parentheses should be able to express it no? s(#^n).*?(#^n)

Kha commented 3 years ago

My concern was over the FirstTokens set, but it should probably be a new token parser (activated by r[#"]), in which case it can do whatever it wants.

eddyb commented 2 years ago

What makes the rust style raw strings more difficult to implement? It's not a regular grammar, but any grammar that can handle parentheses should be able to express it no? s(#^n).*?(#^n)

Note that your ? is a context-sensitive disambiguator: .* can include #^k for k both smaller and greater than n and it's simply impossible for a CFG to tell them apart. Even without the ? there's arguably still an implicit disambiguator in that most regex engines will only give you the "the longest match".

At best you would have an ambiguity-preserving parse forest (à la SPPF from some Earley/GLR/GLL/etc. general CFG parser), and then disambiguate raw strings on that. (amusingly, the disambiguation can be a noop if #^n doesn't repeat in the entire input, or if all the other choices cause parse errors after their presumed end of the raw string - not so funny if the correct choice has its own parse errors though, since you have to disambiguate to know what to report)

But if you already have more flexible tokenization (as seems to be the case here), you don't have to go through all that complication, and you can implement the counting directly. (there could be benefits to parse forests if you didn't already have a non-trivial recursive descent parser, but generally it's a messy tradeoff IME)

For more background on Rust's own raw strings context-sensitivity:

kmill commented 11 months ago

Sebastian and I were discussing this today, and I had a few thoughts and concerns that he thought I should document here:

(Re the raw string discussion, there is an implementation of them at #2929.)