Closed tamasfe closed 2 years ago
Currently all strings are processed as-is, and interpolations are ignored.
It would be dangerous, as there can be nested interpolations. If you just count braces, you may end up going wrong.
`This is first level ${let s = `this is second level ${x}`; s} so there.`
I don't know yet where to process these, as they will complicate the parse tree a whole lot.
What I do is, when I see a `
, I read a string until I get another `
(closing) or ${
. If ${
, then I return the previous string as a partial segment, and start parsing a statement block (assuming that I have seen a {
). When when block ends with }
, then I start another partial string segment. Repeat.
You can simply make the $
(when it is followed by {
) into `
that terminates the previous string literal, and then just start parsing a normal statements block.
Then the parse tree is simply an array of segments + statements block alternatively. Internally, I keep them all as Expr
type, with the string segment simply mapping to a string literal.
After thinking about this some more, I have an idea to fit the parsing of interpolated strings into the current LSP structure. As I understand after Googling a bit, tokenizing an interpolated string is non-trivial.
The idea is to parse the interpolated string as a string of tokens, essentially like what is done with the current Rhai parser.
Upon seeing `
, the tokenizer should parse until it sees either:
`
: for which it returns a LIT_STR
${
: for which it returns a LIT_INTERP_STR
(new token type)}
, manually push `
into the input stream and continue tokenizingIn your grammar, you need a special rule for interpolated strings:
Lit =
'lit_int'
| 'lit_float'
| String
| 'lit_bool'
| 'lit_char'
String =
'lit_str'
'lit_interp_str' '${' Expr '}' 'lit_str'
'lit_interp_str' '${' Expr '}' 'lit_interp_str'
So, for example, the following:
`The answer is ${`an ${if answer.is_even() { "even number" } else { "odd number" }}` + answer}. QED.`
Probably gets parsed into:
lit_interp_str = "The answer is "
Expr: +
String:
lit_interp_str = "an "
Expr if
lit_str = ""
Expr: answer
lit_str = ". QED."
Yeah looks like I'll incorporate it into the parse tree anyway.
I was hoping I could just parse it as a string literal between the `
then lazily process it further without disturbing the existing grammar, but yeah as you mentioned there can be a lot of edge cases as everything can be arbitrarily nested.
I'll tag this as hard instead as it'll need special care in the HIR as well, will get back to this once everything else works.
That's correct. The trick, it seems, is to convert the nested embedded expressions into structured syntax that can be parsed simply with a grammar.
So it seems like an interpolated string is nothing but a fancy function call, sort of. You simply have literal strings instead of comma's separating expressions, and the first piece from `
till ${
as the function name.
So, just to think aloud along that idea... take the following interpolated string:
`The answer is ${`an ${if answer.is_even() { "even number" } else { "odd number" }}` + answer}. QED.`
We parse it as if we have:
str_interp_the_answer_is(
{ str_interp_an( if answer.is_even() { "even number" } else { "odd number" } , "" ) + answer } ,
// ^ this must be parsed as a statements block as there may be multiple statements inside
// ^ the last segment is empty
". QED"
)
In other words, the grammar rules for an interpreted string should be exactly the same as a function call with arguments, and function calls can naturally occur within arguments.
You tokenize the stream in this form:
"The answer is " $ {
"an " $ {
if answer.is_even() { "even number" } else { "odd number" }}
}
""
+
answer
}
". QED."
That would require you manually push a `
into the input stream once you finish the parsing of an interpolated block (i.e. after ending the block with the last }
). This seems to be the only requirement.
Currently all strings are processed as-is, and interpolations are ignored.
I don't know yet where to process these, as they will complicate the parse tree a whole lot.