tc39 / proposal-type-annotations

ECMAScript proposal for type syntax that is erased - Stage 1
https://tc39.es/proposal-type-annotations/
4.27k stars 47 forks source link

A standard, extendable type system #183

Open theScottyJam opened 1 year ago

theScottyJam commented 1 year ago

Preface

As stated in a recent meeting a "main goal" of this proposal is "unifying JavaScript with typed scripts". I strongly believe that the only way to really unify developers is to bring us under the same standard type system. The problem is, as others have noted, that creating a rigorous specification for an entire type system is going to be a massive project that may not see the light of day for a very long time (if ever). I personally would much rather have a solid, standard type system for JavaScript five to ten years down the road vs a "let's just get rid of a build step" proposal today, but I wonder if there's something that can be done to help us break up this huge "type system" task so we can start enjoying some of the benefits of a standard type system in the near future, without having to wait for the whole thing to be designed.

The plan

Imagine if, as an initial proposal, only a few foundational elements of a type system got standardized - things like "this is a number" or "this function returns a boolean". We don't even have nice things like generic syntax or type inference. Some very basic type-checking can be done with it but that's about it. No one is expected to use it by itself, at least to start with, instead you'll need to use an "extension" to provide additional behaviors. These extensions could be new tooling, or it could be an existing typed superset (e.g. TypeScript) supporting both its own syntax and the new native syntax, basically making all of TypeScript's features an extension to the native type syntax. For the rest of this explanation, I'm going to mostly focus on the latter scenario, because that seems like the ideal path forwards, but it does put a heavy burden on the maintainer of these superset languages as it'll basically require them to eventually support two complete but different type systems.

As time goes on, the initial small, standard type system can be expanded to support more and more features. As this happens, developers can start to write code that uses the standard type system more and more while leaning away from their existing tooling, until they would be able to drop existing tooling altogether.

A potential initial type-system

Just to pick some syntax, let's say the initial type-system proposal looks something like this:

To label a variable with a type, use :: <the type> - I'm picking :: just to make a clear distinction that this is an official JavaScript type, and it may have different semantics from a TypeScript/Flow/etc type.

One important type to add early on would be some kind of :: unspecified type. This can be used in places where the built-in type system isn't powerful enough to represent this type (e.g. maybe this should really use generic syntax, or maybe you want it to be a never type, but these don't exist yet in the standard type system). Conceptually, unspecified is different from TypeScript's unknown with how it gets used - unknown is when you know that the type is unknown while unspecified just means the type system isn't powerful enough to describe the desired type, so you're purposely omitting it. Semantically, unspecified could behave the same as unknown (and so perhaps we only include one or the other), or we might find that unspecified would work better with slightly different semantics. What's important is the backwards compatibility it enables - we can say a type is unspecified to start with, requiring end-users to coerce the values to more specific types, then later we can come back and provide a more specific type without breaking any end-user code.

Next is the topic of type inference. TypeScript/Flow is automatically able to infer types using lots of different tricks, e.g. TypeScript understands instanceof, typeof, throw, assert functions, etc. I feel like it would be unwise to try and make JavaScript do all sorts of smart inference initially, instead, JavaScript will need to provide an explicit do-it-yourself type-narrowing syntax - a syntax that states "for the rest of this block, treat this variable as this specific type instead of the more general type it was before". Something like this:

// Normally we would say the parameter is of type `string | unknown[]`,
// but because we might not support union types to start with,
// I'm instead just saying it's unspecified.
function getLength(value::unspecified)::number {
  if (typeof value === 'string') {
    // Explicitly stating that we believe this value is a string
    value is string;
    return value.length;
  } else {
    assert(Array.isArray(value));
    // Explicitly stating that we believe this value is an array
    value is Array;
    return value.length;
  }
}

Note that, for this initial design, the return type is required, and by required I just mean that if you use a standard-compliant type-checker without any extended functionality on this function, it will give you an error if it does not have an explicit return type.

At runtime, type information like ::number will just be ignored. I'm still hopeful that we could eventually provide a way to add runtime semantics to the types via opt-in syntax, allowing you to, for example, do an assert to check that a value really does conform to a specific type, but decisions to do or not to do that can be down the road.

Extending the core

The initial type system described above provides almost nothing - it gives little protection and would be a pain to work with. But, it's a good start, and it gives type extensions something to work with. I wouldn't expect anyone to directly use the first pass of the native type syntax as-is, but it can still be used in other indirect ways. Now let's explore how a typed superset can start integrating with the standard type system.

Say we write a function using Flow like this:

function addNumbers(a: number, b: number) {
  return a + b;
}

If the semantics of Flow's number type is the same as JavaScript's standard number type, it could choose to compile it as follows:

function addNumbers(a::number, b::number)::number {
  return a + b;
}

(While not shown in this specific example, Flow might auto-insert things like someVariable is someType as well during compilation)

This function is shipped as "the-number-adding-package" on NPM, and I go and install it. I use TypeScript which is capable of understanding standard JavaScript types, and so I automatically get proper type safety for this function, despite it coming from a different typed superset.

It won't always work out so nicely. If we modify our example to take an object as input, and let's say the standard type system is structurally typed, then Flow, which uses nominally-typed object, would be forced to compile down to something like this:

// Original function
function addNumbers(numbers: { a: number, b: number }) { ... }

// Compiled version
function addNumbers(numbers::unspecified /*: { a: number, b: number }*/)::number { ... }

As a TypeScript user using this package, I would need to provide my own type definitions for this function, or I would get little type-safety out of it. Note that when Flow compiled, it did still leave additional type information in comments, so a Flow user would be able to use this package and get additional type-safety. Similarly, if TypeScript were to compile, it could leave behind some type definition files with extra information.

Over time, those using Flow (and TypeScript) could start choosing to write some of their types using standard type syntax. This means, the above Flow function may get rewritten as follows:

// Original function
function addNumbers(numbers::{ a::number, b::number })::number { ... }

// Compiled Version (the same)
function addNumbers(numbers::{ a::number, b::number })::number { ... }

Now it's written in a way that any standard-types-understanding type system can understand, making it more portable, but at the cost of making the object type structurally typed instead of nominally typed.

Hopefully, this long-winded explanation shows how we would be able to

  1. construct a standard type system, bit by bit, instead of all-at-once
  2. Provide a migration path to bring people using TypeScript/Flow/etc into the standard types, thus unifying the community back to JavaScript

Additional notes

We may want to make a designated space for typed extensions to live (similar to what this types-as-comments proposal is currently trying to do). It could be new syntax, like @@`...whatever type information...`, or we could just let them use comments like Flow does (TypeScript could adopt a more terse comment syntax - JSDoc is unnecessarily verbose for this use-case). Having an ignored space for type-extensions to live could be important as time goes on, when the native type system becomes powerful enough to handle most use-cases, but you still want to use TypeScript/Flow/etc here and there for some odd-ball things, and you don't want to have an entire compile-step just to handle those use-cases.

I want to make sure I properly address this concern: Is there a point to having the beginnings a native type system in the language if, for it to have any value, it requires using an extension for it? The answer is yes, and it's because it's laying the foundation of interoperability.