tc39 / ecma262

Status, process, and documents for ECMA-262
https://tc39.es/ecma262/
Other
14.97k stars 1.28k forks source link

Editorial: "sequence of code units" != "String" ? #828

Closed jmdyck closed 3 years ago

jmdyck commented 7 years ago

The spec defines SV, TV, and TRV to return a sequence of code units. But a String value is a sequence of code units. So why not define SV, TV, and TRV to return a String value?

Note that TemplateStrings appears to assume that TV and TRV do indeed return Strings. E.g., it says

Let _string_ be the TV of |TemplateMiddle|.

and not

Let _string_ be the String value whose code units are the TV of |TemplateMiddle|.
allenwb commented 7 years ago

The intent was to distinguish between abstract sequences of code units and actual string values which are ECMAScript language runtime values. You should have to have the full semantics of ECMAScript strings available in order to lex/parse ES code or to validate static semantic rules.

From that perspective, the direct use of TV sequences as string values like shown above is a spec. bug. The fix would be language like that shown as the alternative.

jmdyck commented 7 years ago

You should have to have the full semantics of ECMAScript strings available in order to lex/parse ES code or to validate static semantic rules.

(Presumably you mean "You shouldn't".) That seems like a plausible reason, except:

bterlson commented 7 years ago

I am also curious, it seems fine to me to just always use string values (as long as it's clear that implementations don't need to use actual JS strings, which I think it is, given the existing loosyness @jmdyck points out).

bakkot commented 4 years ago

The current editor group talked about it and is weakly split. @michaelficarra and @syg mildly prefer the status quo, @ljharb mildly prefers getting rid of the distinction. Does anyone have a strong reason to prefer one or the other?

TimothyGu commented 4 years ago

As a data point, WHATWG's Infra defines a string as "a sequence of unsigned 16-bit integers, also known as code units. A string is also known as a JavaScript string." I think while aligning wouldn't be strictly necessary, getting rid of the distinction would bring some simplification to the spec.

jmdyck commented 3 years ago

This issue was resolved with the merge of PR #2018.