tc39 / proposal-type-annotations

ECMAScript proposal for type syntax that is erased - Stage 1
https://tc39.es/proposal-type-annotations/
4.22k stars 46 forks source link

A concrete, smaller proposal: make more use of `:` #127

Open bakkot opened 2 years ago

bakkot commented 2 years ago

This proposal has a lot of syntax. But in my own code, almost all of the types are following a :. So just allowing that would get most of the ergonomic benefits.

Specifically, I'm imagining that in a few places (after a variable declaration LHS or parameter, at the start of a statement, after a function parameter list, and maybe some others) we would say that a colon followed by an identifier, possibly with a following no-lineterminator-separated balanced-braces block, would be treated as a comment.

(Or the rules could be more complicated; e.g. when at the start of a statement it could run until the end of the line, skipping balanced blocks; this would allow :type x = y;.)

This would allow the following:

:type { SpecialBool = boolean }

:interface {
  whatever
}

let file: SourceFile;

function sum(x: number, y: number): number {
  return x + y;
}

This would, of course, require you to use different syntax than TypeScript for many programs. But that's going to be true with any proposal, particularly because of generic invocation, but also for anything with runtime semantics like enums.

tabatkins commented 2 years ago

Need to do something a bit fancy to have arguments annotable like this, so you can combine an annotation with default values. A very simple grammar applied to the stuff after the colon would likely solve this - maybe just idents and matched braces (()[]{}) at the top level?

Then function sum(x: number = 1, y: number = 2) would be unambiguously parsed.

bakkot commented 2 years ago

A very simple grammar applied to the stuff after the colon would likely solve this - maybe just idents and matched braces (()[]{}) at the top level?

That's what I was proposing, yeah.

rricard commented 2 years ago

from the matrix chat, @bakkot defined it as

a colon followed by an identifier, possibly with a following no-lineterminator-separated balanced-braces block

I guess that would bring back the = back into interpretation

bakkot commented 2 years ago

(in the main issue, not just the matrix chat)

rricard commented 2 years ago

oops yes indeed

tabatkins commented 2 years ago

Apologies, I skipped right past that. Agreed, then!

tabatkins commented 2 years ago

Well, my proposal differs slightly - I want to make sure that x: foo[bar] would parse as a type, for instance, or x: foo(bar), or even x: foo[bar]{...}.

bakkot commented 2 years ago

I did also intend to capture some of those, though I guess "balanced-braces block" did not make that clear.

I'm less convinced that x: foo[bar]{...} should parse as a type, though. But, these rules could be bikeshedded.

bakkot commented 2 years ago

I guess you might to allow colons in parenthesized expressions (as in) (expression: foo), for type casts.

And maybe also have a form which allows a ! after the :, so that you can do non-null assertions as foo():!.

Is there anything else this wouldn't reasonably cover?

simonbuchan commented 2 years ago

Out of curiosity, why are people concerned about how much syntax this proposal adds? This is like, the fourth issue proposing some variant of a simpler syntax.

I ask because the current proposal is really not that big at all by spec standards, and will have very little impact on the current and any future additions to the spec due to only being syntax.

Why the sudden concern?

bakkot commented 2 years ago

I ask because the current proposal is really not that big at all by spec standards

This proposal adds more syntax than any proposal has ever added to the language, by a wide margin.

and will have very little impact on the current and any future additions to the spec due to only being syntax

Proposals which add syntax have much more effect on the evolution of the language than do proposals which only add new library functions.

simonbuchan commented 2 years ago

But that wasn't the question I asked. I said, it's not that big as a proposal as a whole, and that as just syntax it didn't have much impact, unlike many, many other additions which were not controversial and weren't "just library functions".

Heck, even "just library functions" can have a huge impact, for example adding iterable methods can make future additions to the spec want to return iterable results than arrays, which requires a pretty different approach to specifying the semantics. Or, speculatively, adding a native Abort controller would impact pretty much every existing and future async method.

So, basically, where's the concern? You mention the direction of the language, what about the current proposal is concerning to you there?

(Also, I'm dubious that it's actually adding more syntax by a wide margin when looking at actual spec grammar diffs. Async and string templates for example both required a whole bunch of technical reworking in the spec in order to make them simple to use.)

bakkot commented 2 years ago

But that wasn't the question I asked. I said, it's not that big as a proposal as a whole, and that as just syntax it didn't have much impact, unlike many, many other additions which were not controversial and weren't "just library functions".

Syntax is expensive. A proposal to add a whole bunch of syntax is therefore expensive, even if it can be specified briefly. Sorry, I thought that was implicit in my answer.

many, many other additions which were not controversial

Sidebar: what non-controversial things are you thinking of here? "non-controversial" is not the experience I associate with TC39.

You mention the direction of the language, what about the current proposal is concerning to you there?

To be clear, my concern is mainly the amount of new syntax in itself.

But, separately, this proposal as currently written would prevent a lot of potential future additions, including, among others, actual (runtime) abstract classes, non-null assertions with runtime effects (or any other use of ! in suffix position), any use of this in parameter lists, the use of as in expression position, runtime interfaces, runtime readonly modifiers, etc.

Also, I'm dubious that it's actually adding more syntax by a wide margin when looking at actual spec grammar diffs. Async and string templates for example both required a whole bunch of technical reworking in the spec in order to make them simple to use.

From the perspective of a user of the language, adding async functions required all of two bits of grammar: async as a modifier on various function declaration forms, and await as a unary operator. (Async generators and for await later added a couple more.) The technical refactoring of the spec which was entailed - mostly to deal with the possibility of someone invoking a function named async, and that looking like an async arrow - was not particularly visible to users.

simonbuchan commented 2 years ago

Syntax is expensive. A proposal to add a whole bunch of syntax is therefore expensive, even if it can be specified briefly. Sorry, I thought that was implicit in my answer.

In what sense? I would think expensive in this context could mean for the spec or implementors, I wouldn't expect it to have anything to do with users, as I think you imply you're mainly concerned with elsewhere?

Sidebar: what non-controversial things are you thinking of here? "non-controversial" is not the experience I associate with TC39.

Ha! Fair, I should have said "as controversial". Generally, I see people bikeshedding about what temporal should call local/civil/plain etc times, or raising questions about how this would interact with (obscure situation), etc, it's not all that common that the entire approach or even existence is being questioned, but perhaps I've just not followed enough stage 0 proposals.

actual (runtime) abstract classes, non-null assertions with runtime effects (or any other use of ! in suffix position), ...

Now these concerns are absolutely deserved, and why I steered away from touching any more than minimal introducers in my own proposal #122 (though I say assume I'm importing whatever introducers are needed). I feel this is essentially a question for what TC39 feels comfortable with "giving up" to types, balanced against what's been found to be generally useful in typing. I could easily see some or all of abstract, !, and this parameters getting bumped or replaced, for example. as I think has a good case for itself, you need some introducer for expression annotation and : is often ambiguous in that context, but perhaps they're more comfortable with Flow's (expr: Type) (this would make me sad though), or maybe ::.

But even then, from memory those are pretty minimal cuts to spec size, the bulk is the type syntax itself.

bakkot commented 2 years ago

In what sense? I would think expensive in this context could mean for the spec or implementors, I wouldn't expect it to have anything to do with users, as I think you imply you're mainly concerned with elsewhere?

Expensive for users is what I meant, yes. Syntax is the stuff you have to learn to even be able to read a program. So adding a bunch of new syntactic forms means there's a bunch more stuff you need to learn to be able to read JS code. The fact that none of this stuff has meaning at runtime helps, but does not obviate this concern.

it's not all that common that the entire approach or even existence is being questioned

Yeah, that's mainly a difference between stages. Stage 0 and 1 is precisely the time when the entire approach or existence is questioned, and stage 2 where the details are worked out.

But even then, from memory those are pretty minimal cuts to spec size, the bulk is the type syntax itself.

Well, again, spec size in itself I think is not the important part. My main concern is the number of new distinct bits of syntax a reader has to learn.

In any case, those specific cases weren't meant to be a comprehensive list. Just to pick a couple more from the core Type grammar, it would also prevent the use of |, &, and :: as prefix operators. Moreover, the grammar as proposed is just a bunch of stuff that TypeScript happens to have today. It's not even trying to reserve much in the way of space for future syntax, except by saying that there's an escape hatch for new syntax by putting it in balanced-braces blocks which would be made legal in certain contexts.

And my point in this issue is, we could instead just put basically all of the syntax you could ever want for types in that escape hatch, with only very minor cost (so you can't write x: keyof y, you instead have to write x: (keyof y) or x: keyof(y)). And then the rules get radically simpler.

simonbuchan commented 2 years ago

So I think we're not too much in disagreement really - mainly just priorities.

Also it's not trivial to just throw everything in balanced braces, see my other issue #116 , but that's a problem all the suggested approaches will have.

acutmore commented 2 years ago

While the rule may be similar, that alone does not guarantee that it’s less complex for people to learn and teach. As an extreme example lambda calculus only has 3 parts: identifiers, lambdas and application. But that does not make it easy to read. It’s starts to become readable as more syntax is added.

:type { SpecialBool = boolean }

type SpecialBool = boolean;

I would imagine that the later would be a less intimidating construct to encounter due to its visual similarity to other parts of JS and other languages.

ljharb commented 2 years ago

but also potentially more confusing, because of its visual similarity to code that does something, while it instead does nothing.

bakkot commented 2 years ago

I would imagine that the later would be a less intimidating construct to encounter due to its visual similarity to other parts of JS and other languages.

Once learn the rule that : indicates a comment, then :type { SpecialBool = boolean } is no more intimidating than

/**
 * @param {string}  p1 - A string param.
 */
hax commented 2 years ago

:type and :interface parts very unlikely to work.

foo
:type
{}

is already valid today thank to label. 😂

bakkot commented 2 years ago

Uuuuuugh.

:: for the the start-of-statement context, then!

(Though that would prevent the :: prefix operator...)

hax commented 2 years ago

Use :: would also take a precious token. Currently extensions proposal inherit that from old bind op proposal. Even extensions or call-this proposal could choose some other token, I still feel :: should be reserved for a much common feature (eg. x :: RuntimeType is mentioned in this proposal), because it's a very good token (two same char, not hard to enter in standard keyboard). Use it for comments is too waste.

bakkot commented 2 years ago

Yeah, I could see a case for having a different start-of-line comment token.

Though, also, for multi-line comments the existing syntax is ok. The overhead required for

//: type x = y

or

/*
interface {
  whatever
}
*/

or whatever, is relatively less significant than the overhead required for

function f(a /*: number */) {}
hax commented 2 years ago

Personally I think type x = ... is ok because it just add type alias and seems not have big overhead. But interface is problematical.

tabatkins commented 2 years ago

Out of curiosity, why are people concerned about how much syntax this proposal adds? This is like, the fourth issue proposing some variant of a simpler syntax.

The more syntax we add to these "comments", the more opinionated we are being about how these comments (and thus any type system, or anything else that might want to use this syntax space) can be written, now and forevermore. Changing this in the future then comes with compat risks, which seems weird and bad for something that is comments and shouldn't have any runtime effect!

It's also not great that the syntax being bandied about is very tailored to TS; new type-checkers that might arise in the future will end up bound to these TS-centric spelling conventions. A very generic light-touch syntax allows for expansion in this space without us having to worry about things breaking in the future. (Python gets away with an even more limited grammar than what's expressed here and in my experience that feels fine; I wouldn't mind a little more structure being allowed in Python but I rarely run into issues.)

simonbuchan commented 2 years ago

@tabatkins I agree that the current proposal is too typescript specific, that's why I made my own (incomplete) proposal that's a lot more general, but which actually tries to define a grammar.

Turns out there's a lot of difficulty in getting something actually usable and also flexible, I only define Type and it's at least as complex as the current proposal's Type, probably wrong, and definitely more confusing.

Paradoxically, there's a conflict between "a lot of defined syntax" and "a lot of syntax space covered", and neither of them are "easy to learn" in the sense that most of these alternative grammars seem to mean, including for their own proposals when you actually start pulling at threads.

Like, and I'm not trying to be mean here, but this proposal says "a colon followed by one identifier then optional braced" roughly, then immediately shows an example that didn't work with "function sum(...): number { ... }", obviously fixable, but in general see the above paragraph. There's a lot of landmines here.

It also has the common suggestion of a line starting with a colon going to the end of a line. I'm pretty sure this isn't actually ambiguous (labels really should have a no line terminator here before the colon), but it's a completely new parsing behavior for the spec. Except for automatic semicolon insertion nothing else in the syntactic grammar (eg, after the lexer) cares about the position within a line, eg you can say "you can't have a line terminator here" but not "this must be at the start of a line". If you're concerned about the direction of the language grammar, this by itself is more impactful than anything else suggested by far. Don't get me wrong, I really like layout based languages, but JS isn't one.

Again, the point isn't "ha ha, dumb" it's "looks nice in examples doesn't mean looks nice in spec or easy to learn all the edge cases", which is why I was asking about what people were actually concerned about, because there's a lot of it and handling it depends on what they're actually specifically concerned about.

bakkot commented 2 years ago

Except for automatic semicolon insertion nothing else in the syntactic grammar (eg, after the lexer) cares about the position within a line, eg you can say "you can't have a line terminator here" but not "this must be at the start of a line".

I said "at the start of a statement", not "at the start of a line", to be pedantic about it. But also, I don't think that matters nearly as much as you think. Very few people actually need to learn rules about line terminator handling because it doesn't actually affect how you read code, in practice. Like, while it's true that the spec grammar allows you to write

if (x) {
  foo();
} bar();

this fact is not actually important because no one writes code like that. That sort of detail mostly only affects parser authors. By contrast, the details of the Type grammar in this proposal is very much something you're going be exposed to, constantly.

Anyway, the exact details of how the grammar in my proposal works aren't actually the important part; it wasn't intended to be a fully worked-out grammar. The fundamental suggestion is to carve out a broad and consistence syntax space, rather than trying to nail down a very precise grammar for TypeScript types.

Again, the point isn't "ha ha, dumb" it's "looks nice in examples doesn't mean looks nice in spec or easy to learn all the edge cases", which is why I was asking about what people were actually concerned about, because there's a lot of it and handling it depends on what they're actually specifically concerned about.

Right, so, the thing I am primarily concerned about is how much stuff you're going to have to learn to read the language, not how much spec text there is. (Speaking as editor of the specification, "how pretty is this to specify" is the absolute last concern.) You don't need to learn the NLTH rules to read most programs, generally speaking, but with the proposal in its current state you are in fact going to need to learn a ton of rules in order to read basic code.

acutmore commented 2 years ago

but with the proposal in its current state you are in fact going to need to learn a ton of rules in order to read basic code.

One approach I have used when teaching TypeScript, in particular to people who don’t have a JS background, is to copy/paste code into the TS playground so they can see the code with the types removed. To help get a concrete sense of what is and isn’t a type.

One hope I have is that if that boundary is made concrete then editors/IDEs could build that straight in. With a button that toggles the visibility of the type comments, or just dims them.

simonbuchan commented 2 years ago

Ah, at the start of a statement makes sense, my mistake! (There's another issue that suggests the start of the line, I guess assuming they mean statement is only fair)

I still think it doesn't buy you anything compared to reserving, say, type for your statements, but at that point it's pretty much just taste (assuming you're not trying to directly include the majority of existing Typescript)

I'm still not happy about the phrasing of "to the end of the line" even, do you mean something like to the next (possibly automatic) semicolon or braces? That would seem to cover any existing type declaration I can think of with only straightforward changes (including type import/export!), and I'd be quite happy with it.

In any case, declarations are relatively easy, it's the inline annotations where the spiders are hiding and you need self contained grammar. My main issue with "colon, some simple thing" suggestions are that as a heavy typescript user, it feels like I would be getting a much worse experience when it seems I shouldn't need to be: that the brackets, whether they are braces or parentheses, are "giving up" in some sense. Extremely common type annotations like readonly string[], Stream<View> or "success" | "failure" suddenly need to be pulled out into a declaration or get bracketed: not the worst outcome, but possibly enough to push people (me included!) into sticking to "real typescript", when, if this is done right, .TS files could basically go away (for new code at least). If there's an out of line annotation that Typescript would be happy to pick up, that would solve a lot of the problem, but I'm not sure that could exist in a form that keeps the various stakeholders happy.

I already don't use Typescript enums or namespaces when l have the generally superior discriminated unions and modules, so I'm not missing them being excluded (and they could make sense to add separately as a semantic JS feature if people want them), so really I do think it's useful to try to include Typescript syntax as much as possible. Sometimes it's not, but that's fine, when it's clear why it's not. The generic arguments is an easy case, abstract classes is another case where "we might want to do that semantically later", but the case gets a lot murkier with type syntax: "it's available and already what I'm using, what's the problem?" needs a stronger counter argument than "it seemed like too much to learn for you".

My goal with my proposal was "anything that if you just eyeballed it looked like it was part of a type even if it was in some language you didn't know" - which is quite tricky to do without being ambiguous, and of course means you're stomping on more potential future syntax than just adding exactly what typescript uses, but as I said the idea is to find something in the middle.

(Speaking as editor of the specification, "how pretty is this to specify" is the absolute last concern.)

They need to put a bell on you committee folk! 😄 I mostly raise that as it seems to be often used as a proxy for how complex it is to use or understand, while, as you surely know, it's often the opposite.

simonbuchan commented 2 years ago

One approach I have used when teaching TypeScript, in particular to people who don’t have a JS background, is to copy/paste code into the TS playground so they can see the code with the types removed. To help get a concrete sense of what is and isn’t a type.

I'd be really interested to see what things trip them up, especially if they're consistent! Seems like if nobody expects bare type annotations to have binary operators, for example, that's a good argument against allowing them (and requiring parentheses)