Escape Mechanisms in Identifiers

mehmetoguzderin commented 2 years ago

Today, the way to spell code points for WGSL within identifiers and anywhere else is the direct input (one can use escape mechanisms found in the languages containing WGSL strings, such as JavaScript, but the WGSL spec has no notion of escaping inside it directly). Character escaping is a valuable utility to spell code points inside source files due to a reduction in direct input and visual recognition (beyond just an internationalization concern). This escaping mechanism(s) usually gets support on a few levels: identifiers, string literals, source, RegEx, etc., but where it is of immediate relevance in WGSL is the identifiers, where we support XID Identifier code points. These mechanisms are for both declaration and use of these identifier names.

Three ways of escaping in JavaScript:

\x??: The hex number must consist of two code points, where valid values are in [00, FF] (inclusive range). This range primarily encompasses ASCII characters.
\u????: The hex number must consist of four code points, where valid values are in [0000, FFFF] (inclusive range). This range primarily encompasses more common code points.
\u{?...?}: The hex number must consist of one or more code points, where valid values are in [0, F...F] (inclusive range). This range is for all Unicode code points.

Some languages, such as Rust, opt not to support \u????, where they only support \x?? and u{?...?}. However, I would argue that for JavaScript string literal and WGSL source code copy-paste compatibility, it is valuable to support all three methods of escaping in JavaScript. And I have not been able to spot support for \x{?...?}.

A figurative RegEx (one sad thing is that we will not be able to embed the XID Range restrictions with the code points represented in these escapes, where it would probably take an almost infinitely long RegEx string):

/(([_\p{XID_Start}]|(\\x[0-9a-fA-F][0-9a-fA-F])|(\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])|(\\u\{[0-9a-fA-F]+\}))([\p{XID_Continue}]|(\\x[0-9a-fA-F][0-9a-fA-F])|(\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])|(\\u\{[0-9a-fA-F]+\}))+)|([\p{XID_Start}]|(\\x[0-9a-fA-F][0-9a-fA-F])|(\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])|(\\u\{[0-9a-fA-F]+\}))/uy

In anticipation of https://github.com/w3c/i18n-activity/issues/1511 feedback, we think this feature can be a great post-V1 enhancement since there is no clashing aspect within the WGSL spec, and there is a way to input the non-ASCII characters (direct input).

Kangz commented 2 years ago

I'm not sure I understand what is the issue here. WGSL doesn't have strings, so the escaping for identifiers is only needed in the language that produces the WGSL source, not in WGSL itself. AFAIK Rust escapes that you referenced are for use inside of string literals but not in the middle of an identifier.

Javascript supports a way to escape, so it seems there is nothing to do?

mehmetoguzderin commented 2 years ago

@Kangz I agree that this issue does not prompt any immediate action, but I do not think escaping is only needed in languages producing WGSL sources as WGSL sources will certainly have direct human authors to them that store WGSL sources in files, and in the digestion of these sources, there is no one-step solution, there are methods like eval (only one that should support all JavaScript escapes) or JSON.parse and unescape is no longer a Web standard. So source-level specified escapes, as in JavaScript, would be an easy-to-implement feature to unite this aspect across all parsers, and I think the i18n team's feedback is valuable here, but certainly not V1. As to the point about Rust, only a few languages such as JavaScript, C# (with restrictions), and CSS (case of b\006F dy { background-color: #000000; } support escapes in identifiers, and Rust is indeed not one of them. This support, eventually, would allow escape in situations where something like raw string String.raw is the source.

Kangz commented 2 years ago

The W3C recommendation linked in the i18n tracker doesn't say that specs SHOULD provide escapes. It is reasonable to say that whatever is producing WGSL can do escapes however it wants. Javascript identifiers are strings because you can define a dictionary as {a : 1} so it kind makes sense that you can have identifiers contain unicode escapes (because that's a reasonable code transform to remove quotes when not needed). WGSL is statically typed and has 0 notion of strings, so I don't think it ever makes sense to add this feature.

mehmetoguzderin commented 2 years ago

@Kangz The V1 response to the i18n issue's query is certainly that these code points are input directly. Otherwise, if it is an enormous burden to keep this as an issue for post-V1, it can for sure be closed then, I guess?

mehmetoguzderin commented 2 years ago

BTW, I do not think spec needs any change in its parse clauses to account that these are direct input and any language feeding can escape because the parsing section makes it clear that it is code points that come to the WGSL parser and how these code points are spelled is a non-WGSL topic. (also, JavaScript identifiers are not exactly strings, they restrict in a lot of ways, same applies to C#, though I think there is no need to spend time discussing this issue as the response is clear as the day that input happens, for V1, through direct input)

Kangz commented 2 years ago

@Kangz The V1 response to the i18n issue's query is certainly that these code points are input directly. Otherwise, if it is an enormous burden to keep this as an issue for post-V1, it can for sure be closed then, I guess?

IDK. I think it's unlikely we'll ever do it, so we're just having some more noise in the issue tracker. As you wish.

mehmetoguzderin commented 2 years ago

@Kangz I'll close after people see that we did not overlook this feedback and had a discussion about how we and the spec are already clear about direct input is the way!

dneto0 commented 2 years ago

I agree with @Kangz : I think this is out of scope for WGSL itself.

mehmetoguzderin commented 2 years ago

Although this suggestion here started as feedback from i18n, due to the immediate response of direct input, it is marked as post-v1 (denoting it as "closed" for i18n with the "immediate" response), and I suggest this issue to be rather non-cycle-consumptive until then. (Months) earlier I personally disagreed on this being a "noise" (as it was factually not a "noise" in any trackers since in inception was marked as post-v1) but did not comment that to avoid causing anybody to visit here with emails and for further-in-timeline action, I think status would not hurt; the number of issues is likely never to hit 0. Thanks for the understanding!

mehmetoguzderin commented 2 years ago

Closing with the note of this improvement is not out-of-scope (large possibility of absence at the meet). I tried my best to label this correctly to avoid unconstructive reflex.

mehmetoguzderin commented 2 years ago

A tiny clarification of interpretation since I was reserving for much later in time (there can be label:whynot, but this not): This suggestion is clearly source-level escapes and not strings. The original text just lists how languages approach escaping. This feature's benefit might not be apparent hence why the mention of the keyword "escaping" comes from experts here; the presence of this would allow authors to have the same program string be interpreted in the same way in both text loading and JavaString embed string. "Escaping in JavaScript makes sense because dictionary" is a straw-person argument (unless it is shown that the ergonomics benefit was never a consideration) invalidated by the status of CSS. Again, I don't get why this was targeted down and gets aired, as there are things V1 to do. (and this change does not add any ambiguity points FWIW).

kdashg commented 2 years ago

WGSL meeting minutes 2022-07-05

* Offline: DN: Recommend closing this. Don’t keep it in PostV1. (But only discuss if Oguz is available) * Offline: MOD: Label was post-v1 (there the lowest priority likely to never make to the front of queue), don’t get why this gets aired. Was an extract from good expert inquiry. * _CLOSED_ * KG: Procedurally, would prefer to keep post-v1 things open, but just in the post-v1 category, which would not cause them to show up in our v1 issue burndowns/triage. Ok? * MOD: Sounds good. * **KG: I will reopen and leave this as post-v1**

gpuweb / gpuweb

Escape Mechanisms in Identifiers #2810