tc39 / proposal-improve-template-literals

JavaScript language proposal of raw string literals that can contain any arbitrary text.
MIT License
24 stars 0 forks source link

Investigate other syntax options #2

Open hax opened 9 months ago

hax commented 9 months ago

1. here doc

This might be the most popular due to adoption by shells and the friends (perl, ruby, etc).

There are two parts of the here doc syntax, first, the leading symbol, normally it's <<, some languages have <<- and/or <<~ for tab/indentation process.

The big problem of these symbols is tag << blahblah, tag <<- blahblah, tag <<~ blahblah already are valid syntax in JS. Maybe this is why PHP use <<< instead of <<.

The second part is ident as a delimiter, many support 'ident' for non-interpolation version. C++, D drop << symbol, but still use customized ident as delimeter.

Note, it seems no language support customized interpolation delimiter by here doc style syntax.

My feeling is, it's hard to use here doc style in JS.

2. Swift / Rust style

They use surrounding # s. If we adopt this style, the syntax will look like

let x = #` string can contain ` or #.`#

If follow Swift interpolation design, #` ${foo} `# would be no interpolation effect. To have the effect, use same number of # -- #` $#{foo} `#. It also could be used to enable escaping case by case: ###` \n <-- raw, \###n <-- line break here `###

I feel swift-like design is very concise and attractive. It could be the good option of JS.

The only drawback is it can't satisfy goal 5:

Current usages of template literals should be easily migrate to the new syntax. Importantly, if a template literal does not need to present ${ characters, it should not be forced to change interpolation syntax.

3. Markdown fenced code block style

I think most programmers already know that format. Though there are challanges:

Surprisingly, three backticks already valid syntax in JS. So it will be breaking change of language even no one really use three backticks in current codes. I guess there is still chance to use this syntax but need many extra works. So an easier way is to add prefix @ or #.

zaygraveyard commented 9 months ago

How about a variant on the swift style where interpolations (and escapes) require same number of $ (and \) as # to work?

Example:

let x = #` this string can contain interpolations ${a} and \n escapes as ususal`#
let y = ##` but this string requires $${a} for interpolations and \\n for escapes`##
mon-jai commented 9 months ago

@zaygraveyard It seems much more cleaner than the current one.

hax commented 9 months ago

@zaygraveyard @mon-jai Of coz it's an option.

But there are a few points to consider:

  1. The correspondence between the number of # symbols and the number of $ \ symbols requires additional cognitive effort.

  2. In a long text, there are three different special symbols to identify (for example, $$$, \\\, and ###). In contrast, Swift's approach uses only one special symbol.

  3. When the number of # symbols needs to be changed in a long text, the cost of identification transforms into the cost of searching and replacing. Without dedicated IDE support, this cost can also be three times higher compared to the Swift approach.

  4. No simple way to disable all escaping and interpolation so you can just write any number of \ or $ without worry of triggering escaping and interpolation accidently. In Swift, you only worry about any number of #. Moreover, the probability of simple repeated characters occurring is much higher than the probability of special characters combined with repeated characters occurring.

mon-jai commented 8 months ago

@hax That make senses. Can I conclude it this way?

Original proposalbr>(<code>@&#96;&#96;string&#96;&#96;</code) #2 proposal
(#`string`#)
@zaygraveyard's proposal
(#`string`#)
Special character sequences $, `</code> |$#,#|$,` ,#
Insert escaped characters Unsupported \#n \n
mon-jai commented 8 months ago

The original proposal seems better, but the use of @ and `` feels odd/ugly to me.

Could we use # instead and lower the minimum backtick requirement to 1? It would be more easy to understand too.

Base case: #`My name is ${name}`, no $ / ` allowed in the string

If you need $ in the string: ##`$100`, use interpolation with $${foo}

If you need ` in the string: #``select * from `users` where `name` = ?;``

zaygraveyard commented 8 months ago

@hax Fair points and I do agree, but I would like to point out that they also apply to the original proposal (except for \ since escaping is not supported AFAICT).

@mon-jai Your "base case" is equivalent to `My name is ${name}`, correct?

mon-jai commented 8 months ago

@zaygraveyard The base case is similar to `My name is ${name}`, but with indentation removed if the string has multiple lines.

zaygraveyard commented 8 months ago

After more reflection on this subject, I'm more in favor of the Swift style:

  1. It would allow start and end delimiters to be on the same line (useful for the regex example)
  2. It supports escaping
  3. I feel @ should be reserved for decorators
  4. I find it reasonable that a "raw string literal" would require use of $#{...} for interpolations
rauschma commented 8 months ago

I love the initial Swift-inspired idea:

`Can't use backticks \n ${someVar}`
#`Can use `backticks` \#n $#{someVar}`#
##`Can use #`backticks with 1 hash`# \##n $##{someVar}`##
###`Can use ##`backticks with 2 hashes`## \###n $###{someVar}`###
anonghuser commented 3 months ago

Another possible syntax and prior art example: LUA has [[ multiline string ]] and [===[ multiline string ]===], with any number of =, just same on both ends. It has no escapes or interpolation in this syntax.

More thoughts

To adapt in JS the opening would have to be prefixed with `@` or something to distinguish from array literal, `@[===[ multiline string ]===]`. Has the advantage that opening and closing patterns are easier to distinguish from each other. Another advantage is that the outer pattern need not always be longer than the nested one, just different. Lua does not, but JS could also allow any arbitrary pattern between the brackets instead of `=` characters, similar to heredoc's name or mime's boundary, etc. But then perhaps the beauty of the visually easily recognizable, symmetric opening and closing markers is diminished, and we may as well just use syntax closer to heredoc directly. We could also go for any number of brackets instead of `=`, `@[[[[ multiline string ]]]]` but that adds issues when you want literal `]` right at the end of your string. Same goes for the current proposal tho, what if you want the string to end with `` ` ``? It is solved by allowing or requiring whitespace before the closing pattern (like requiring it to be on its own line), but that in turn makes it necessary to "eat" its preceding whitespace, complicating things further when you want to end with whitespace or line break, as well as making it take more lines which IMO may look bad. (tangential to this issue, but I feel the default syntax shouldn't "dedent" or strip or otherwise alter the white space you typed, leave that to custom tags for those who need it) P.S. LUA also does that for block comments: `--[=[ see how this --[[ nested comment ]] did not close the outer comment ]=]` `@//[=[ js could too @//[[ but probably as a separate proposal ]]]=]`

Edit: I saw some advantages in it over the @``````initially proposed`````` syntax, but had not consider the "swift-inspired" one above, which seems better to me now. So I'm leaving this only as another prior-art example rather than a serious suggestion for inclusion in JS.

hax commented 2 months ago

@anonghuser

Theoretically speaking, JS does not need to add @ at the beginning, because the combination [= is currently not a valid syntax. As I understand it, the characteristic of the Lua style is the use of paired opening and closing symbols [ and ]. However, I am not sure if this has unique advantages. For instance, theoretically speaking, even with the Swift style, it is still possible to allow the outer pattern to be shorter than the nested pattern—because the continued # symbol is almost impossible to form meaningful syntax, so the parser can choose to look one more character to decide if it is the closing delimiter.

Regarding the issue of trailing whitespace, it is indeed a matter worth considering. However, my current experience is that the most important scenario for this proposal is large chunks of text, and for large chunks of text, trailing whitespace at the end of each line is an issue in itself (for example, most editors may automatically remove trailing whitespace). Even if we restrict the syntax to ensure trailing whitespace in the string, it cannot solve the problem of trailing whitespace at the end of each line, at most it only solves the problem of the last line. So in real world, people still need introduce some other extra effort (eg. some tag function) to solve such trailing whitespaces issue.

anonghuser commented 2 months ago

even with the Swift style, it is still possible to allow the outer pattern to be shorter than the nested pattern—because the continued # symbol is almost impossible to form meaningful syntax, so the parser can choose to look one more character to decide if it is the closing delimiter.

@hax i like this, but it would probably need to be made clear explicitly if you do pick this syntax for the proposal, i.e. tag#`a`##`b`# is a single tagged literal, and not equivalent to (tag#`a`#)#`b`# or tag #`a`# #`b`# (which are equivalent to each other, and may have valid use-cases with tag functions that return tag functions). Edit: The whole stunt is probably pointless if it doesn't also apply to escapes inside the string, i.e. #`\##`# is treated as "\\##" instead of the "#" some might expect. In other words you can't have an escaped #, which doesn't need to be escaped anyway, but is still allowed to be escaped in all current types of string literals. You have to consider if you want this overcomplication and inconsistency with other types of string literals as a price for being able to embed longer-delimiter literals inside shorter-delimiter literals. My vote is it's not worth it after all.

jlandrum commented 2 weeks ago

This is a problem as old as time - it's amazing how many solutions exist out there for this.

Just some observations:

I love the idea of this being solved @hax, so many languages have tried and many have failed. Just my perspective - we need some way to extensively demonstrate it and test against it.

I propose a (modestly) large file with example uses using all of the currently available string building methods - doesn't have to be valid code, just a double newline delimited file of various examples of strings in numerous use cases. Something to whiteboard ideas against since otherwise anyone involved will be most likely considering their own realm of expertise and not others. In short, we need a common ground to work off of before deciding syntax. This way we avoid ending up in a "solution looking for a problem" scenario.