rhaiscript / rhai

Rhai - An embedded scripting language for Rust.
https://crates.io/crates/rhai
Apache License 2.0
3.83k stars 180 forks source link

String interpolation #121

Closed schungx closed 3 years ago

schungx commented 4 years ago

Easier than writing:

print("The answer to " + num + " x 2 = " + (num * 2)+ "!");

We can write:

print("The answer to ${num} x 2 = ${num * 2}!");

with the results converted to string via the to_string function, which can also be custom-defined for custom types.

Any ideas to whether we should use a separate string literal character aka JS's `xxx ${...} xxx` vs "xxxxxxx" for non-interpolated string. Two separate string literal styles typically appear where there is historical baggage -- string interpolation is added on later.

In Rhai, we can simply support "xxxxx ${...} xxxxx" and escaping it for a normal string: "xxxxx \${...} xxxx".

profan commented 4 years ago

C# fwiw uses for interpolated literals a different form as well, ala:$"interpolated stuff goes here: {some_thing.thinger}", might be simpler to parse as well? (I personally think what JS did is a mistake that reads badly to new people and experienced devs alike, python also uses a prefix for strings for string interpolation: https://www.python.org/dev/peps/pep-0498/)

The exact syntax I think isn't as important, but I think having a clearly separated form for literals that are interpolated is good both for structure and for reading, also makes the escaping of the interpolation syntax unnecessary if you want to actually print {} in a normal string literal.

schungx commented 4 years ago

Many scripting languages I see have a tendency to use ${...} to wrap interpolated stuff for some reason. I hesitate to use a simple {} because it is just too easy to overlook a {} pair -- I know as I write lots of C# myself.

Also, in Rust and C#, {} in a string is primarily used for enclosing formatting commands -- actually C# uses it for both interpolation and formatting.

What about $"interpolated stuff goes here: ${some_thing.thinger}"? Too wordy?

profan commented 4 years ago

I wouldn't mind that last form either, I'm just generally against only having a magic form of like quotes that make it interpolated, so as long as there's a clear prefix for a string literal that makes it an interpolated one I'm good :eyes:

I see the point about potentially overlooking a pair :thinking:

jhwgh1968 commented 4 years ago

Ironically, Ruby uses #{} for interpolation -- the syntax that is currently in master for hashes! :smile:

I keep thinking about Rhai as a language that is a mixture of Rust and Python, so I am partial to {}. This doesn't seem to be a problem in those languages.

In Python 3, I should note, it's not a problem because they attach string interpolation to the string type itself as a method. That causes the special handling to occur, and without it, nothing happens.

Example:

x = 1
y = 2
z = 3
output = "The sum of {} and {} is {}".format(x, y, z)
print(output)

I am open to other options, but if I were to run off and implement it, that is what I would do.

schungx commented 4 years ago

Ironically, Ruby uses #{} for interpolation -- the syntax that is currently in master for hashes! 😄

LOL!

I keep thinking about Rhai as a language that is a mixture of Rust and Python, so I am partial to {}. This doesn't seem to be a problem in those languages.

The problem with {} is that you need more look-ahead for a predictive top-down parser to distinguish between a statement expression and a hash literal. Or we disable statement expressions, which may actually be heaps better for the evaluator (because a lot of hairy borrow-checker issues will be avoided when an expression cannot mutate state). However, statement expressions are quite powerful and handy when you need them...

schungx commented 4 years ago

Anything inside quotes are treated in a separate inside loop so we are not limited to syntax that doesn't conflict with other language elements.

jhwgh1968 commented 4 years ago

If you wanted to parse all in-references as a block or an expression, that is true. Python -- again, my inspiration here -- limits you to variable names and a set of formatting operators.

Here is my previous example, this time in binary. Note that the f"" syntax is just sugar for "".format(...):

x = 1
y = 2
z = 17
print(f"The sum of {x:02} and {y:b} is {z:x}") // Get it? Get it!?

What's interesting is that this Python syntax almost lines up with our current definition of a hash. The only differences are, (1) you don't quote the printing options, and (2) they can be entirely omitted, whereas hash values cannot.

schungx commented 4 years ago

I'm personally more inclined to stick to JS/Rust styles because that's what most of Rhai looks like. And that's what most users will expect. Python and Ruby are a bit off from C-style syntax.

Some form of prefix quote:

$"...... { xxxxx } ......."
$"...... ${ xxxxx } ......."
#"....... { xxxxx } ......."
f"....... { xxxxx } ......."

or a new character (like JS):

`......... { xxxxxx } .......`
`..........${ xxxxxx } .......`

One of the reasons to use a compound delimiter e.g. ${ ... } instead of a simple {...} pair is that it may be common for string interpolation to be used for generating code - to run under eval perhaps? Having to constantly escape braces in string interpolation is a pain in such a use case.

jhwgh1968 commented 4 years ago

Having made a couple suggestions I was not very invested in, just to give perspective, I will conclude with my fairly broad personal opinion. (I now have other areas of the code to make improvements to :wink: ).

So long as formatting is explicitly invoked by f"..." or "...".fmt() or similar, I personally do not have a strong opinion about exactly what the "format this" marker is, or how powerful it is (or isn't).

schungx commented 4 years ago

Great! I am currently leaning towards $" .... ${ xxxx } .... " as unambiguous with the further benefit of being able to format-print Rhai code sliplets for (God forbid) eval.

schungx commented 4 years ago

Well, from my personal usage experience, it seems that print("hello: " + x + " worlds!") is not too bad compared with print($"hello: ${x} worlds!").

So closing this.

schungx commented 3 years ago

Reopening this as #380 brings up the need for literal strings, which now has an implementation aka JavaScript: `this is a literal string`.

Now do we start supporting JavaScript-like string interpolation?

`this is a literal string... the answer is ${answer}!`

Or do we choose another syntax for literal strings to avoid using yet another ASCII character, such as @" or $"?

nikita-skobov commented 3 years ago

I think any form of string interpolation is good:

`this string has an answer ${computed_answer}`
$"This is my string with variable: {something.something}"

I think any form of something like the above is good, but something like this is too cumbersome:

mystring = "something here {}, and here {}".format(1, 2)
schungx commented 3 years ago

Personally I like the back-tick and ${...} for interpolation, but it may be just me.

Interpolation can be simply implemented by re-parsing the string token:

`the answer to the question is: ${get_answer(42)} !!!`

into

("the answer to the question is: " + (get_answer(42)) + " !!!")
Eliah-Lakhin commented 3 years ago

I would vote for JavaScript/TypeScript-like syntax too. Mostly just because of personal preferences as a pro JavaScript developer. But Rhai design already have strong inspiration from the JavaScript world, so keeping this new feature in sync too could be a reason.

schungx commented 3 years ago

A pity that the interpolation feature will be greatly simplified by generators (basically turning a tokens stream into a recursive generator), but it is nightly at this point...

patrickelectric commented 3 years ago

From my experience, I believe that using backticks to create template literal strings is a clear and popular alternative for a script perspective. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals

schungx commented 3 years ago

Now that Rhai has back-ticks for literal strings, all that's remaining is to implement ${ ... } interpolation.

Not a simple matter without using generators (nightly-only) to turn a recursive function into an iterator stream.

Alternative is to build a stack-based parsing stream hopping in/out of string and interpolation modes, but that would essentially be writing the generator by hand. Might as well wait for generators to land in stable?

Generators tracking issue: https://github.com/rust-lang/rust/issues/43122

schungx commented 3 years ago

On the other hand, I just realize that generators is still very much experimental and unlikely to show up in stable any time soon. So, maybe I'll spend some time over this coming Easter to try and implement it...

I don't want to do it via recursive parsing, because that would defeat the ability of the Playground to syntax-color the interpolation code. It really needs to be parsed to one flattened token stream.

schungx commented 3 years ago

String interpolation via `... ${ ... } ...` has now landed: https://github.com/rhaiscript/rhai/pull/388

Unfortunately, it still uses recursive parsing, meaning that the Playground is likely not able to recognize interpolated strings because it uses only the tokenizer. Without a parser, the tokenizer has no way to tell when an interpolated string ends.

@alvinhochun do you have any ideas how to resolve this?

Maybe expose a parser method for this purpose?

alvinhochun commented 3 years ago

Unfortunately, it still uses recursive parsing, meaning that the Playground is likely not able to recognize interpolated strings because it uses only the tokenizer. Without a parser, the tokenizer has no way to tell when an interpolated string ends.

@alvinhochun do you have any ideas how to resolve this?

The tokenizer is stateful, so if you store the nesting level into the state it should be able to handle it, similar to how nested comments are handled. However, I haven't looked at how easy it is to adapt your implementation to do this.

schungx commented 3 years ago

The problem is with providing the correct nesting level. The tokenizer by itself only look at a stream of characters. It has no knowledge of Rhai syntax. Therefore, once it starts having ${, there is no mechanism for it to know which } it should stop at and start parsing everything as a string again.

It cannot stop at the first } because of nesting:

let x = `hello ${ if some_flag { 42 } else { 999 } } worlds!`;

It is possible to force the string interpolation to end with a magic symbol pair, such as }$ - this idea is similar to ASP's usage of <% .. %>. In this case, the tokenizer can handle it easily.

let x = `hello ${ if some_flag { 42 } else { 999 } }$ worlds!`;

However, this syntax is most non-ergonomic.

Technically speaking, we can count the number of brace pairs to decide when the interpolation has ended, but then we would run afoul of custom syntax - although that may be a small price to pay.

alvinhochun commented 3 years ago

It may be possible to plug the parser into the CodeMirror tokenizing code, but this is quite heavy, and since CodeMirror caches the tokenizer state at the start of every line, this would mean storing the whole AST from the start up to a point for every single line, not to mention that I will have to rewrite the tokenizing code... I would rather not have the parser involved here.

I would think that tracking brace pairs would be the simplest solution here. The Playground does not support custom syntax and will probably never support it by default.

Does custom syntax allow overriding the syntax to the point that the number of left and right braces does not have to match (except when inside normal string literals or comments)?

schungx commented 3 years ago

The Playground does not support custom syntax and will probably never support it by default.

Ah. That's a good point. This is the key. We can probably count braces.

Does custom syntax allow overriding the syntax to the point that the number of left and right braces does not have to match (except when inside normal string literals or comments)?

Yes, technically it does. It allows any stream of symbols behind a unique custom keyword.

schungx commented 3 years ago

There is a new Token variant called InterpolatedString which means that the string is terminated by ${.

It is further facilitated by reading off the $ character, leaving the { opening brace to parse the following as a complete Rhai statements block. Which means that Token::InterpolatedString is always followed by {.

So theoretically, we push a layer when we read an Token::InterpolatedString, and then count braces. When we read another Token::InterpolatedString, we push another layer, and so on.

However, after the closing brace for interpolation, there is no way for the tokenizer to know that it should switch back to text mode. In Rhai right now I hack it by pushing a ` character into the stream. I believe you can do the same (but beware to rewind the position by one character if you do that).

schungx commented 3 years ago

OK, I've added a new field to TokenizeState called is_within_text_terminated_by which is Option<char>. Set it to Some('`') to switch back to text mode. This seems to be the simplest way.

schungx commented 3 years ago

OK, closing this for the time being. Interpolation will be included for the next release.