whatwg / infra

Infra Standard
https://infra.spec.whatwg.org/
Other
118 stars 95 forks source link

ECMA262 completion records are not interpreted appropriately #518

Open jugglinmike opened 1 year ago

jugglinmike commented 1 year ago

ECMA262 uses a control flow convention built on completion records--"wrapper" values that the standard's macro-like shorthands (e.g. ReturnIfAbrupt and the question mark symbol) use to conditionally halt algorithms.

The effect is similar to the notion of "exception throwing" in web specifications, but the mechanism is fundamentally different. This means that fallible ECMA262 algorithms cannot be interpreted identically in web specs.

For instance, consider "serialize a JavaScript value to a JSON string" in Infra

  1. Let result be ? Call(%JSON.stringify%, undefined, « value »).
  2. If result is undefined, then throw a TypeError.
  3. Assert: result is a string.
  4. Return result.

According to the definition of the ? shorthand, the first step either stores a value in "result" or causes this algorithm to return a "throw" completion record. Combining those possibilities with the following steps means that the algorithm can have one of three results: returning a "throw" completion record, throwing an error, or returning a string.

In ECMA262, the first two are different ways of saying the same thing, but that isn't the case in web specs. We can see this in the call site of the algorithm described above:

  1. Let string be the result of serializing a JavaScript value to a JSON string given value.

Were this written in ECMA262, that invocation would likely be preceded by the ? symbol. The Infra standard probably omits it because web specifications rely on exception throwing semantics. However, the effect is that the value of the variable named "string" may be a "throw" completion record.

Although there are only three instances of the ? operator in Infra, I'm told that this pattern also shows up in HTML and in WebIDL, so I'm reporting this issue to start a discussion about the best way to address the problem.

jugglinmike commented 1 year ago

The first solution that comes to mind is for Infra to define a macro-like shorthand named "throw if abrupt". It could look something like this:

Algorithms steps that say or are otherwise equivalent to:

  1. Throw if abrupt given argument.

mean the same thing as:

  1. Assert: argument is a Completion Record.
  2. If argument is an abrupt completion, throw argument.[[Value]].
  3. Else, set argument to argument.[[Value]].

And here's how it could be used to correct "serialize a JavaScript value to a JSON string":

-1. Let result be ? Call(%JSON.stringify%, undefined, « value »).
+1. Let result be Call(%JSON.stringify%, undefined, « value »).
+1. Throw if abrupt given result.
 1. If result is undefined, then throw a TypeError.
 1. Assert: result is a string.
 1. Return result.
jugglinmike commented 1 year ago

The "?" operator appears about 60 times in HTML (the simple search query includes some false positives):

$ grep '\s?' html/source | wc -l
60

In addition to the semantics problem detailed in this issue's description, neither Infra nor HTML explain what "?" means.

WebIDL is more explicit in its application of "?", annotating each use with a reference to the definition in ECMA262.

$ grep '\[=?=]' webidl/index.bs | wc -l
74

WebIDL also indirectly explains the operator by explicitly adopting the algorithm conventions of ECMA262. Unfortunately, it does this by making some inaccurate statements about the relevant semantics:

Algorithms in this section use the conventions described in [=ECMA-262 Algorithm conventions=], such as the use of steps and substeps, the use of mathematical operations, and so on. This section may also reference abstract operations and notations defined in other parts of ECMA-262.

When an algorithm says to throw a |Something|Error then this means to construct a new ECMAScript |Something|Error object in the [=current realm=] and to throw it, just as the algorithms in ECMA-262 do.

Note that algorithm steps can call in to other algorithms and abstract operations and not explicitly handle exceptions that are thrown from them. When an exception is thrown by an algorithm or abstract operation and it is not explicitly handled by the caller, then it is taken to end the algorithm and propagate out to its caller, and so on.

With all this in mind, here's a variation on the solution I suggested above:

  1. Define the "?" operator in Infra using the above description of "throw if abrupt" and a note highlighting the distinction with ECMA262's definition
  2. Annotate Infra's uses of the operator with a reference to Infra's new definition
  3. Annotate HTML's uses of the operator with a reference to Infra's new definition
  4. Update WebIDL's uses of the operator to instead reference Infra's new definition
  5. Clarify the description of "throw" semantics in WebIDL

Generally, I'd prefer to use a distinct algorithm name in order to avoid confusion, especially for an esoteric shorthand like "?". One issue with that is the amount of churn it would involve (using "throw if abrupt" or similar would involve splitting algorithm steps apart).

I think overriding the operator might work in this case, though. The proliferation of this problem suggests that folks are tacitly performing the translation on their own. In that respect, overriding the operator for web specs just closes a loophole for folks who are new to this work and for those who are scrutinizing the boundaries with ECMA262.

annevk commented 1 year ago

I like the idea of linking ? and explaining its slightly adjusted meaning. I wonder if that covers all potentially confusing cases or if you can still stumble upon some resulting from this mixing of styles.

jmdyck commented 1 year ago

I agree with the problem statement, and the 5-step approach sounds good.

However, the problem of completion records is a bit bigger:.

jugglinmike commented 1 year ago

Thanks, @jmdyck! As a fourth possibility about asserting normal completions, could we consider just referencing ECMA262's definition of "!" directly? As I understand the semantics, they already seem coherent from the perspective of WHATWG conventions.

jmdyck commented 1 year ago

Yup, that too.