tc39 / proposal-type-annotations

ECMAScript proposal for type syntax that is erased - Stage 1
https://tc39.es/proposal-type-annotations/
4.27k stars 47 forks source link

[grammar] Ambiguities in the current draft grammar #190

Open takikawa opened 1 year ago

takikawa commented 1 year ago

There are a few cases in the current draft grammar where ambiguous parses are possible or the grammar doesn't fit in LR(1), which would likely need some syntactic restriction or refactoring of the grammar to fix.

Arrow type ambiguity

In JS this is valid syntax:

() => () => []

Given this, it's possible to construct a valid example expression like the following that erases to expressions with different parse trees. This doesn’t occur in actual TS because () is not a valid TS type.

(x : t) : () => integer => () => []
//      ^^^^^^^^^^^^^^^ Type Annotation
//        ^^^^^^^^^^^^^ FunctionType
//                         ^^^^^^^^ ConciseBody

or

(x : t) : () => integer => () => []
//      ^^^^^^^^^^^^^^^^^^^^^ Type Annotation
//        ^^^^^^^^^^^^^^^^^^^ FunctionType
//              ^^^^^^^^^^^^^ FunctionType
//                               ^^ ConciseBody

Arrow and ternary operator

The ternary operator introduces a different ambiguity:

  3 ? (x) : S => (x) : () => 3;
//    ^^^ AsssignmentExpression
//          ^^^^^^^^^^^^^^^^^^ AssignmentExpression (ArrowFunction)
//               ^^^^^^^^^^^^ ArrowFunction
//                   ^^^^ TypeAnnotation
//
//    ^^^^^^^^^^^^^^ AssignmentExpression (ArrowFunction)
//                     ^^^^^^^ AssignmentExpression (ArrowFunction)

This erases to either 3 ? (x) : S => (x) => 3; or 3 ? (x) => (x) : () => 3; depending on the parse tree.

FunctionType and ParenthesizedType

Since the FunctionType production starts with a parenthesized parameter list, the parser can't determine whether to parse a FunctionType or ParenthesizedType until an arbitrary amount of input is consumed. I believe this would put the grammar out of LR(1), so possibly a cover grammar is needed here. (this isn't technically an ambiguity)

Bitwise or/and

The following program using bitwise or (or a union type) may be ambiguous:

let x = 3;
console.log(x | 9); // 11
console.log(x as number | 9); // 3 or 11?

In the case of TS, the parser considers 9 to be a type and will print 3. I believe the draft grammar is ambiguous and either parse would be accepted. For example:

x as number | 9;
^^^^^^^^^^^^^^^^ RelationalExpression
     ^^^^^^^^^^ Type
     ^^^^^^^^^^ UnionType, etc

^^^^^^^^^^^^^^^ BinaryORExpression
^^^^^^^^^^^ RelationalExpression
     ^^^^^^ Type

This is unlike 3 * 5 + 7, where the ES grammar only allows the top-level production to be AdditiveExpression, not MultiplicativeExpression. The problem is that | occurs as an operator at two production levels.

takikawa commented 1 year ago

I'll also add some comments discussing potential solutions to these ambiguities.

Bitwise and/or

Starting with the bitwise or/and issue, I think there is a simple solution here to add a lookahead restriction. So in this rule:

RelationalExpression :
   RelationalExpression [no LineTerminator here] as Type
   RelationalExpression [no LineTerminator here] as const

we would change the first rule to:

RelationalExpression :
   RelationalExpression [no LineTerminator here] as Type [lookahead ∉ { |, & }]

This should rule out a parse of x as number | 9 in which the type doesn't include the union, because the lookahead restriction doesn't let a | follow. This might rule out a small number of valid expressions, such as an as with a function type followed by a bitwise or, but it's not very likely that anyone will write that.

takikawa commented 1 year ago

Arrow and ternary operator

For the ternary operator, the ambiguity comes from having a concise arrow function body with a result type annotation, because arrow functions are AssignmentExpressions (which can appear in a ternary operator's result expressions).

The arrow function concise body itself can have AssignmentExpressions as well.

We could restrict these concise bodies to not allow typed arrow functions, by using a new production like AssignmentExpressionWithoutTypedArrow for the concise body that only has regular arrows.

In that case, 3 ? (x) : S => (x) : () => 3; would parse as 3 ? ((x) : S => (X)) : (() => 3) because (x) : () => 3 is not a valid concise body expression. If the other parse is desired, extra parens can be added: 3 ? (x) : (S => (x) : () => 3);

If there are still ambiguities with ternary operators, a further restriction would be to switch ternary operator bodies to also use AssignmentExpressionWithoutTypedArrow, requiring typed arrow functions to be parenthesized when used in a ternary operator.

takikawa commented 1 year ago

Arrow type ambiguity

The problem in this example is that it's not clear when to stop parsing a function type (which looks like TypeParameters_opt ParameterList => Type) and parse the body of an arrow function (also demarcated by =>).

A simple solution would be to restrict typed arrow function expressions such that their result type annotation doesn't follow the Type production and instead follows a new production TypeWithoutFunctionType that doesn't allow a function type to appear unparenthesized. A typed arrow function that returns a function could still annotate its result type, but it would be parenthesized like this:

(x : number) : ((y: number) => number) => (x) => x; // works in TS too