tmedwards / sugarcube-2

SugarCube is a free (gratis and libre) story format for Twine/Twee.
https://www.motoslave.net/sugarcube/2/
BSD 2-Clause "Simplified" License
187 stars 42 forks source link

National letters in name of variable #37

Closed redirectme closed 4 years ago

redirectme commented 4 years ago

As far as I understand, $variable and _variable are translated to State.variables.variable/ State.temporary.variable (by markup/scripting.js). It may be better to translate into State.variables['variable']/ State.temporary['variable']. This will allow us to use the name with national letters. Of course, there is need to change RegExp Patterns.variable etc.

greyelf commented 4 years ago

I'm curious, what are JavaScript's 'rules' in regards to the naming of an Object property/key?

HiEv commented 4 years ago

The rules are that, when using "dot notation", the property/key name must be made up of Unicode letters, numbers 0 to 9, $, and/or _, though a number cannot be the first character. (Source)

When using "bracket notation", the content within the brackets can be any string or a JavaScript Symbol. (Source)

As an FYI, if you do any of these:

<<set $tést = 3>>
<<set State.variables["tést"] = 3>>
<<set State.variables.tést = 3>>

then any of these will print the correct value:

<<= $tést>>
<<= State.variables["tést"]>>
<<= State.variables.tést>>

(<<=>> is shorthand for <<print>>)

However, this will not:

$tést

That simply displays the text "$tést".

So the problem appears to only be with recognizing/printing so-called "naked variables" when they contain some non-English Unicode letters.

As a workaround, the <<=>> macro can be used to make them display their values, as shown above.

redirectme commented 4 years ago

Unfortunately, this does not work with the Cyrillic alphabet. tw2sc231

tmedwards commented 4 years ago

TL;DR version

(Please read the full version for more information.)

I'm sorry, but I18n support for variable names is not happening in SugarCube v2 for the reasons outlined below.

I18n support for variable names is something planned for SugarCube v3—it's already in and being tested. No, I don't know when v3 will be ready.

Full version

As far as I understand, $variable and _variable are translated to State.variables.variable/ State.temporary.variable (by markup/scripting.js). It may be better to translate into State.variables['variable']/ State.temporary['variable']. This will allow us to use the name with national letters.

The desugaring code only translates the sigils—i.e., $ and _. It does, however, attempt to verify that what it's attempting to desugar actually matches the variable name pattern, which is what is catching you out.

Of course, there is need to change RegExp Patterns.variable etc.

That's what would need to change, yes. Unfortunately, see below.

Unfortunately, this does not work with the Cyrillic alphabet.

As you saw, using variables normally should not work with non-US-ASCII, unless you use the API. The only reason HiEv's example worked is because it slips by the variable name check due to starting with a valid character—if the example had been <<set $ést = 3>>, then it would have also failed.

I'm curious, what are JavaScript's 'rules' in regards to the naming of an Object property/key?

A string or symbol, though symbols aren't safe if you want full browser coverage.

That said, and as noted below, SugarCube requires variables meet the standard of an identifier name. Thus, the fact that they're stored as properties is largely immaterial for normal usage. I mean, you can work around it, but it won't be pleasant.

The rules are that, when using "dot notation", the property/key name must be made up of Unicode letters, numbers 0 to 9, $, and/or _, though a number cannot be the first character. (Source)

Unfortunately, this is one of those rare cases where MDN is incorrect.

The ECMAScript standard defines an identifier, as seen ES10/ES2019: 11.6 Names and Keywords, in terms of ID_Start and ID_Continue from UAX #31: Unicode Identifier and Pattern Syntax, which are:

ID_Start
[\p{L}\p{Nl}\p{Other_ID_Start}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]

ID_Continue
[\p{ID_Start}\p{Mn}\p{Mc}\p{Nd}\p{Pc}\p{Other_ID_Continue}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]

Those translates into something like the following using the new ECMAScript Unicode character property escapes:

// Start and Part:
const IdentifierStart = /[$_\p{ID_Start}]/u;
const IdentifierPart  = /[$_\u200C\u200D\p{ID_Continue}]/u;
// Combined into a full identifier:
const IdentifierName = /^[$_\p{ID_Start}][$_\u200C\u200D\p{ID_Continue}]*$/u;

Here's where the problems start.

  1. ES Unicode character property escapes are still somewhat new—Firefox doesn't even support them yet—and are totally unsupported in older browsers.
  2. SugarCube v2 supports older browsers—back to IE9.
  3. SugarCube, all versions, requires variables meet the IdentifierName standard.

Thus, we need to translate the ES Unicode character property escape versions into old school \uhhhh character classes—we can't even use \u{hhhhh} as it requires the u flag. Additionally, we need them as patterns, rather than expressions, so that's going to bloat them a bit.

After doing so, and taking advantage of ranges to reduce the size of the patterns, we end up with the following totals: (not including IdentifierName)

For a total of 18.73 KiB.

And that's not even all of it as there's another pattern or two that would need the same treatment, so all told we're looking at likely over 40 KiB. Further, without knowing exactly how this might impact performance in large games, I'd be hard pressed to even consider it.

I'm sorry, but I just don't see I18n support for variable names happening in SugarCube v2. That said, it is something planned for SugarCube v3—it's already in and being tested.

tmedwards commented 4 years ago

Closing this, since it's been resolved in the v3 repository: https://github.com/tmedwards/sugarcube-3-prealpha

NOTE: The v3 repository is not yet public, so the above link probably will not work for you until it is.