Closed Jamesernator closed 2 years ago
May I throw a couple of edge cases in here?
Would an as
operator be included? If so, how do you deal with order of operations, considering you're allowing any operator to work within the type space.
return x as y | z // Is this a bitwise-or, or a union type?
When annotating a function's return type, how do you know when the type annotation stops and the function body starts? A type annotation is allowed to contain curly brackets, but that's also how we deliminate the start of a function body.
function fn(): { x: number } | { y: number } { /* function body */ }
How should ASI be dealt with? Consider this example:
let x: { a: whatever, b: whatever, c: whatever }
| { a: whatever, d: whatever } // This continues the type definition from the previous line
property = 2 // This line is normal JavaScript
How do we know when the type space ends and normal JavaScript starts again?
When annotating a function's return type, how do you know when the type annotation stops and the function body starts? A type annotation is allowed to contain curly brackets, but that's also how we deliminate the start of a function body.
function fn(): { x: number } | { y: number } { /* function body */ }
Hey @theScottyJam!
In the current tentative grammar there is a specific rule for UnionType
(and IntersectionType
) that is essentially type <operator> type
or type
. Because there isn't a valid type operator (currently |
or &
) after the { y: number }
the parser can no longer continuing consuming tokens as a type
and will then have a parse goal of a function body.
@acutmore
I did notice that from the current grammar. What I'm mostly referring to is the grammar being proposed in this thread. Specifically, the fact that, according to the grammar from this thread, a "type" is just a bunch of different things that can sit next to each other with no separator (where a "thing" is, e.g., a numeric literal, a keyword, type syntax for an object literal, etc). And, I'm not really sure it's possible to define a type that broadly because of the issues I mentioned above.
Ah, I didn't read your post with the proper context loaded into my brain. Thanks for clearing things up for me 😀 I've hidden my post as 'off topic' for this thread.
@Jamesernator It would be great if you could contrast what you're proposing to what's already in the current grammar proposal here. The current proposal already makes provisions for "well-bracketed" constructs and specifically avoids specifying further grammatical rules for their contents in order to preserve the ability for type syntax to evolve independently (i.e. any new type construct can simply be put in parentheses in order to ensure it is ignored).
Would an
as
operator be included? If so, how do you deal with order of operations, considering you're allowing any operator to work within the type space.
Yes, I just didn't add all of the special places that the proposal has for types to appear to the grammar. The main idea was that these trees would be decently specified but also flexible enough for multiple type systems to use.
How should ASI be dealt with? Consider this example:
ASI is a difficult beast that I'm not sure, presumably similar to however existing constructs do it.
When annotating a function's return type, how do you know when the type annotation stops and the function body starts? A type annotation is allowed to contain curly brackets, but that's also how we deliminate the start of a function body.
This is problematic, although it does seem to contradict the general goals in the README that type systems should be able to experiment and evolve syntax without changes to the JS language. See also below my response to @ahejlsberg's question.
@Jamesernator It would be great if you could contrast what you're proposing to what's already in the current grammar proposal here. The current proposal already makes provisions for "well-bracketed" constructs and specifically avoids specifying further grammatical rules for their contents in order to preserve the ability for type syntax to evolve independently (i.e. any new type construct can simply be put in parentheses in order to ensure it is ignored).
I would be fully supportive of a strongly specified grammar. If parenthesized token lists are the limit that type systems are happy with for their extension point, then yes the above proposal doesn't really add anything over what is specified. It just seems surprising that the grammar is so specified, given that one of the goals of the README is to "without prohibiting existing type systems from innovating in this space".
Certainly such as AST would be considerably more useful for use cases like exposing at runtime or even just to tooling. (As now the "typeNodeSequence" things I called above are even simpler to detect and use a simple recursive descent parser for things like binary operators and such).
I'm going to close this as there is a similar issue about the grammar.
This a followup to my comment as was requested.
Just to summarize that discussion, basically if we were to expose types to runtime as some form of metadata having only strings would be a bit painful as essentially one has to ship a tokenizer and parser along with any such usage.
Now while we do want a good amount of flexibility for type systems to be able to specify syntax, because of parsing these constructs will have to work in order to support the feature at all, we should already have a significant amount of the neccessary AST which we could expose to runtime.
Note that even if we don't choose to expose this information directly to runtime, having a canonical grammar for types would be significantly helpful for tooling and things that need to be able to perform tasks on the actual AST (e.g. Babel, code editors, etc).
So basically the rest of this is just giving a rough overview of what grammar flexibility is neccessary, what is impossible, and what would be a good set to expose as a canonical AST format.
Unneogiatable points
These aspects of syntax cannot be negotiated because they fundamentally contradict the existing JS grammar. In particular we require the following:
=
symbol must be well-contained to distinguish it from the beginning of a variable assignment, this will most likely be accomplished by requiring a "well-formed bracketing" that also applies to generics=
, carrying this syntax over would be preferable>
/<
are already perfectly valid operators in JS, this means grammar conflicts withfoo < Bar > (value)
would exist if this was changed"stringTokens"
especially as they may contain characters that would otherwise cause ambiguity=
earlier, if=
were not well-bracketed it could indicate a variable assignment, the same rules need to apply to string tokens and other such similar tokens (e.g. template string tokens it that is supported)interface
,import type
,type Name =
and so on.Prior art
NOTE: Popular type systems here refers to "TypeScript, Flow and Hegel"
Now a lot of syntax in existing type solutions is based on both existing languages and literature. A non-comprehensive list of features that are supported by all popular type systems:
{}
denote some form of object type, the actual contents of these object types are quite flexible however<>
to denote genericstype Name =
andimport type
{
/<
/(
/[
are always paired with a corresponding closing token whenever present|
(ARGS_TYPES) => RETURN_TYPE
,
delimits a list of items (some also allow;
for delimiting items in certain situations)Basis for the following proposal
With the above information, I would like to propose a specific grammar for types that captures a large amount of the above while also giving a large amount of flexibility to type systems to extend with new constructs.
In the design of this, I looked at where type systems tend to experiment most with syntax, and the primary area they develop new syntax is in unary, binary, and special type operators.
Take for example TypeScript's mapped types as a concrete example:
In this example many of the above points mentioned still hold:
"foo"
,O
,Key
,as
){ ... }
still denotes an object type<...>
still denotes generics[ ]
are still well matchedHowever where the real flexibility comes in is in the bunch of unary/binary/special operators that are present:
extends
is a special binary "operator"-like thing on generic parameterskeyof
is a unary operator on a valueas
is a special joiner=
is used as an operator within a genericO[key]
is used as a special syntaxNow the following grammar proposal tries to capture these points by in particular allowing sequences of tokens to be fairly unrestrictive, but constraining "well-matched" constructs.
Proposed grammar
I would like to propose something similar to the following grammar that would produce canonical ASTs for type constructs. This captures the well-formedness of many bracketed and generic structures, but keeps large freedom in the simple idea of "token sequences", which is essentially just a sequence of parsed AST nodes.
The grammar is annotated with comments explaining each part of the grammar, why it is chosen, why it is restricted in such ways, and so on.
So without further ado, a grammar proposal:
We would also have the special additions to the JS grammar parse in these ways:
AST Format
Now you might look at the above and question whether or not it really adds anything over just a "well-matched" covering grammar. I would say yes, in particular consider the AST node for an example similar to the mapped type from above:
We would get out the following AST:
One of the key takeaways here is that while the AST format doesn't understand what a
"typeNodeSequence"
might actually mean, it is sufficiently tokenized and even has some other recursive nodes (corresponding to the well-bracketed constructs) that allow you to reconstruct most of what you would need for any type system just given this AST.This would be far far easier to use at runtime than a plain uninterpreted string which requires a full parse. However even beyond runtime, tooling that needs to inspect JS AST's can agree on common tokens and such so processing such type grammars is simpler (for example token colorizing, custom babel transforms, etc).