tc39 / proposal-type-annotations

ECMAScript proposal for type syntax that is erased - Stage 1
https://tc39.es/proposal-type-annotations/
4.25k stars 46 forks source link

Will parsing types require extended lookahead / backtracking? #49

Open samwgoldman opened 2 years ago

samwgoldman commented 2 years ago

In order to skip over types, engines will need to parse them. Compared to existing comments, the proposed types-as-comments will require more work to find out where they end. Both TypeScript and Flow parsers use backtracking to handle certain parses.

If so, would there be challenges updating the ECMAScript grammar to support these cases? Would there be challenges for engines who might prefer simpler parsing rules?

bakkot commented 2 years ago

TC39 is extremely unlikely to accept any syntax which requires unbounded backtracking.

benjamingr commented 2 years ago

Early discussion with browser engine engineers seemed optimistic about being able to parse and ignore these comments efficiently. You'll also notice not all of TypeScript/flow syntax is supported in this proposal.

matthewp commented 2 years ago

I too have found it confusing that this proposal is called "comments". In every language I'm aware of, including JavaScript, a comment consists of a start character sequence, an end character sequence, with anything in the middle being allowed.

I'm assuming that is not the case here, and that only valid identifier chars are allowed within the "comment" section. For example, is this allowed?

function(thing: a[2]32]*@@!) {

}

?

Assuming this is not allowed, I think calling these comments is both inaccurate and confusing. Maybe "unchecked type annotations"?

joshgoebel commented 2 years ago

@matthewp You may want to weigh in on #78.

Thom1729 commented 2 years ago

I maintain Sublime Text's JavaScript, TypeScript, JSX, and TSX highlighting, and I can confirm that highlighting TypeScript requires backtracking (or the moral equivalent). ECMAScript syntax is mostly deterministic context-free, except that arrow functions are nondeterministic context-free. TypeScript's syntax is much more complicated and much more nondeterministic, and it has some gnarly (and undocumented) corner cases.

That said, the thing about highlighting is that you need to know what the token is when you get to it. The ECMAScript grammar avoids explicit nondeterminism via covering productions. This doesn't help a real-time highlighter very much, but it can suffice for other applications. I don't know to what extent this would work for TypeScript. TypeScript currently has no spec; its syntax is implementation-defined. I assume that the implementation is efficient, and that by extension the syntax can be parsed efficiently. But that efficiency might not necessarily survive the grammar changes necessary to make TypeScript a superset of ECMAScript.

The addition of :: in the proposal grammar is encouraging. That would be much easier to parse, and it shows that the proposal authors are willing to deviate from TypeScript where appropriate.

acutmore commented 2 years ago

For example, is this allowed?

function(thing: a[2]32]*@@!) {

}

Hi @matthewp! With the current proposed grammar (very much subject to change) that would not be syntactically valid, as you assumed it wouldn't be.

function(thing: a[2]32]*@@!) {
//              a[2]   = arrayType
//                  32 = literalType

A literalType can't directly follow an arrayType, so the parser would stop attempting to parse a type at the 3.