microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.29k stars 12.39k forks source link

Strawperson TypeScript integration for Record & Tuple #49243

Open rricard opened 2 years ago

rricard commented 2 years ago

Strawperson TypeScript integration for Record & Tuple

🔍 Search Terms

✅ Viability Checklist

My suggestion meets these guidelines:

📃 Motivating Example

We can use and type records to be used as a coordinate system:

type Coordinate = #{ lat: number, lon: number };
const boaty = { name: "Boaty McBoatface" };
const coords: Coordinate = #{ lat: 13.18653, lon: -24.78303 };
const boats = new Map<Coordinate, any>([[coords, boaty], /* ... */]);

assert(boats.get(coords) === boaty); // OK ✅
const coordsCopy: Coordinate = #{ ...coords };
assert(boats.get(coordsCopy) === boaty); // OK ✅

💻 Use Cases

This change will not only make TS compliant with ECMA262 once Record & Tuple lands in the spec but it will add type representations that will make Record & Tuple even more ergonomic to use.

High quality TS support for Record & Tuple is essential for it to be widely adopted. The feature is more useful if it is backed by a good type system.

Strongly-typed immutable data structures aid writing correct and reliable programs, so Record & Tuple in TypeScript is an excellent fit!

⭐ Suggestion

The champion group for Record & Tuple is going to ask for Stage 3 soon and we want to have a solid TS integration plan before going forward with Stage 3.

The goal of this issue is to propose a strawperson mechanism in TS to handle the proposed Record & Tuple.

This is based on what @rbuckton suggested in the proposal's repo.

Step 0: Disambiguate existing TS Records & TS Tuples

Before any work is made into integrating Record & Tuple into TS, it would be useful to disambiguate existing TS Records & TS Tuples from ES Records & ES Tuples. This is unfortunately going to be a source of confusion and it might be useful to start going ahead of this problem.

The Record<K,V> utility type

As TS, like JS, is staying backwards compatible, Record<K,V> is here to stay and is not going anywhere.

We would like to however make it an alias of another more precisely named type and then softly deprecate Record:

type Dictionary<K extends string | number | symbol, T> = { [P in K]: T; };

/**
 * @deprecated Use `Dictionary` instead of `Record`
 */
type Record<K extends string | number | symbol, T> = Dictionary<K, T>;

Dictionary was sugggested to us by @rauschma in a R&T issue: https://github.com/tc39/proposal-record-tuple/issues/82#issuecomment-1135747127

Naming is fully up for debate.

The TS Tuples

TS Tuples being a syntactic construct and not a named type, it will be easier to work around in the language but also harder to change across the board as both the TS documentation and hundreds of blog articles have adopted that terminology.

The disambiguating terminology could be "Fixed Array" as sugggested to us by @rauschma in a R&T issue: https://github.com/tc39/proposal-record-tuple/issues/82#issuecomment-1135747127.

Naming is fully up for debate.

Step 1: record & tuple primitive types

Now comes the question of integrating ES Records & ES Tuples into TypeScript without introducing completely new semantics to TS.

Set of all Records or Tuples

We would introduce new record and tuple primitive types in TS. Those match the set of all records and all tuples respectively:

const rec: record = #{ bar: "baz" }; // ok
const tup: tuple = #["bar", 123]; // ok

This would match the JS runtime behavior here since those are the typeof returns for Record & Tuple:

assert(typeof #{} === "record"); // ok
assert(typeof #[] === "tuple"); // ok

Marker record primitive type

Those primitive types would act as markers on types like any other primitive type:

// the extra `readonly length: number` is useless but works today
type AString = string & { readonly length: number };

const foo: AString = "foo"; // ok

Using that principle we can get a type for a specific given record:

type Foo = record & { readonly bar: string };

const fooRec: Foo = #{ bar: "baz" }; // ok
const fooObj: Foo = { bar: "baz" }; // not ok

This will also let Records match normal interfaces:

interface Foo {
    bar: string;
}

const fooRec: Foo = #{ bar: "baz" }; // ok
const fooObj: Foo = { bar: "baz" }; // ok

fooRec.bar = "qux"; // ok for ts, will throw in engine -> don't forget `readonly`!

Marker tuple primitive type

We've seen records but tuples are similar:

type Foo = tuple & (readonly [string, number]);

const fooTup: Foo = #["bar", 123]; // ok
const fooArr: Foo = ["bar", 123]; // not ok

Finally we can also represent arbitrarily sized tuples:

type Foo = tuple & (readonly string[]);

const fooTup: Foo = #["bar", "baz", "qux"]; // ok
const fooArr: Foo = ["bar", "baz", "qux"]; // not ok

Record & Tuple Object constructor & wrapper types

We also need to define the following, we disambiguate with the Wrapper suffix:

// Not really necessary as `{}` is already the super of everything apart from `null | undefined`.
interface RecordWrapper {}

interface RecordConstructor {
    readonly prototype: null;
    new(value?: any): never;

    isRecord(arg: any): arg is record;
    // ...
}

declare var Record: RecordConstructor;

interface TupleWrapper<T> {
    readonly length: number;

    slice(start?: number, end?: number): tuple & (readonly T[]);
    // ...
}

interface TupleConstructor {
    new(value?: any): never;

    isTuple(arg: any): arg is tuple;
    // ...
}

declare var Tuple: TupleConstructor;

Step 2: Type syntax for Record & Tuple

To keep the feature modular to implement, we can land separately the following syntax to represent Record & Tuple:

type Foo = #{ bar: string };
// would be equivalent to:
type Foo = record & { readonly bar: string };
type Foo = #[string, number];
// would be equivalent to:
type Foo = tuple & (readonly [string, number]);
type Foo = string#[];
// would be equivalent to:
type Foo = tuple & (readonly string[]);

This is not required to properly support R&T but it should be widely used if implemented.

Complementary work: strictReadonly

The problem

Because we designed Record & Tuple to integrate well with existing JS (and therefore TS) code, we cannot make R&Ts incompatible with existing interfaces even if they are lacking readonly markers on their properties.

This has the negative consequence of TS not erroring on the following code:

interface Foo {
    bar: string;
}
function anythingCanHappenToFoo(aFoo: Foo) {
    // ...
    aFoo.bar = "I can change anything!";
    // ...
}

anythingCanHappenToFoo({ bar: "hello" }); // Typechecks OK, Runtime OK
anythingCanHappenToFoo(#{ bar: "hello" }); // Typechecks OK, Runtime TypeError

Note that the same problem already happens today with frozen objects.

We also think that not letting people plug in Records or Tuples into types that should match but are mutable would be a greater problem for Record & Tuple adoption and basically make them an "island feature" in TS that couldn't work with existing code.

Immutability by default, even for objects: strictReadonly

The good news is that mutating object arguments are already quite a bad practice and a source of silent logic errors so they are not recommended anyway. Most reasonably modern code avoids the practice already.

Others thought of that, notably in https://github.com/microsoft/TypeScript/issues/13347#issuecomment-908789146, making all fields readonly unless opted out by a modifier via a strictReadonly option:

interface Foo {
    bar: string;
}
function anythingCanHappenToFoo(aFoo: Foo) {
    // ...
    aFoo.bar = "I can change anything!"; // Typecheck FAIL
    // ...
}

anythingCanHappenToFoo({ bar: "hello" }); // Typechecks OK
anythingCanHappenToFoo(#{ bar: "hello" }); // Typechecks OK

And at this point, mutable or writable modifiers would make interfaces reject R&Ts:

interface Foo {
    mutable bar: string;
}
function anythingCanHappenToFoo(aFoo: Foo) {
    // ...
    aFoo.bar = "I can change anything!"; // OK
    // ...
}

anythingCanHappenToFoo({ bar: "hello" }); // Typechecks OK
anythingCanHappenToFoo(#{ bar: "hello" }); // Typecheck FAIL

Many questions need to be answered as part of this (should this mutable property be "viral" to other interfaces? ...) so that is why this should be considered a parallel proposal and be untied to R&T. This also might be quite a sizable feature so keeping it separate to adding a new ECMA262 feature is beneficial.


This issue was kindly reviewed by @acutmore, @dragomirtitian and @robpalme. It also contains suggestions from @acutmore, @dragomirtitian and @rauschma.

acutmore commented 2 years ago

In addition to the above. There is also downlevel transform to think of. I think this can be split into 3 potential levels of support:

Downlevel support L0

aka - no support. Any record/tuple syntax is directly preserved in the output. This is the same as the current support for bigint

// input
let rt = #{ a: #[1, 2, 3] };

// output
let rt = #{ a: #[1, 2, 3] };

Downlevel support L1

aka - bring your own. The literal syntax is removed and replaced with the equivalent constructor calls. The equivalent level of support for bigint, as an example, would be turning 1n into BigInt('1'). This could be seen as similar to the 'react' mode for JSX support, where the syntax is transformed into a function call.

Users of this mode would either need to be sure that their environment has these globals available, or install a polyfill themselves. This mode may also be useful where, even though the target environment has R/T support, other parts of their downstream toolchain may not yet have support for parsing the syntax (e.g. the bundler or minifier).

// input
let rt = #{ a: #[1, 2, 3] };

// output
let rt = Record({ a: Tuple(1, 2, 3) });

Downlevel support L2

aka - built-in polyfill. This mode would not only apply the L1 transform, but also include the necessary supporting library.

Any R/T polyfill will have the caveats of:

// input
let rt = #{ a: #[1, 2, 3] };

// output
import { __Record, __Tuple } from 'tslib';
let rt = __Record({ a: __Tuple(1, 2, 3) });
DanielRosenwasser commented 2 years ago

Here's some top-of-mind feedback right now

Intersection Tagging

The record & and tuple & tagging mechanisms seem like logical ways to support the two in today's type system; but it's unfortunate that you can't use interface to declare a record. You would always have to use a type alias.

It does also feel like it's a bit of a step backwards in what we're expecting from users. Needing object & has become extremely rare these days - probably because the combination of overlap checks on weak types and excess property checks has removed some issues from the lax expectations of a structural type system. Needing record & and tuple & to differentiate the two new types is one aspect, but users now also need to more intentionally write object & as well.

Narrowing Behavior

Given that more kinds of types can be described by all-readonly object types, typeof foo === "object" cannot entirely remove these types from a union.

interface Opts {
    readonly volume?: number;
    readonly hue?: number;
}

function setHair(opts: Opts | number | string) {
    let volume;
    let hue;

    if (typeof opts === "object") {
        volume = opts.volume ?? 47;
        hue = opts.hue ?? 0;
    }
    else {
        // In this branch, 'opts' can still have the type 'Opts'
        // because it might be a 'record'.
        volume = typeof opts === "string" ? +opts : opts;
        hue = 0;
    }

    if (typeof volume !== "number" || !Number.isFinite(volume)) {
        throw new Error();
    }

    if (typeof hue !== "number" || !Number.isFinite(hue)) {
        throw new Error();
    }

    // ...
}

// No call site errors, but won't work at run time.
setHair(#{ })

As an aside, technically the same is true thanks to "function", but we take some liberties and assume that an object type that might have call signature won't get passed to functions that need to narrow down to something with properties.

function wat() { }
wat.volume = 47;

// Allowed, but this is a hole in TypeScript.
// This function doesn't correctly handle cases with functions instead of objects.
setHair(wat);

You could argue that the right way to write this narrowing code would be to check for the primitive types first and then assume an object-ish type in the final branch; however, someone has written code out there like this. Anyone who hasn't might, and often won't know better until they've run into this issue.

I guess that that's going to be the case with any addition of a new primitive - existing code may stop working correctly, and new code may just forget to account for the new primitive. But when records are intended to be substitutable in enough places where objects are, there's some risk there.

I also think a record is more likely to get used by programmers than a bigint, just for the sake of using a record

ljharb commented 2 years ago

That narrowing behavior seems like a dealbreaker - it is exceedingly common and idiomatic for x && typeof x === 'object' to be used to runtime-narrow to Object types only.

DanielRosenwasser commented 2 years ago

Ugh, the way I wrote it up was probably a mental lapse and I should clarify - the positive case will continue to narrow. In the second branch, we may still have to assume that opts has the type Opts | string | number or possibly (Opts & record) | string | number.

Original text was

Under this scheme, it's no longer enough to check typeof foo === "object" && foo to see if something is an object type.

New text is now

Given that more kinds of types can be described by all-readonly object types, typeof foo === "object" cannot entirely remove these types from a union.

ljharb commented 2 years ago

ah gotcha, makes sense

rricard commented 2 years ago

Thank you @DanielRosenwasser for the feedback on this.

If I put my TS integrator at Bloomberg hat on; we do prefer interface (See section 9 of this article from Rob) and we are not necessarily keen on breaking changes! These are good points you are making.


Before I go into more details on how we could potentially solve those issues, I want to separate what choices are made in that strawperson TS proposal and why and what are core choices to the proposal.

At the proposal level: at the moment we define R&T as primitives, that indeed means that typeof will not return "object" for them. This does not force TS to handle records as if they were objects and vice-versa.

Now on the TS level we are imagining the same experience as we intend to make with WebIDL, meaning that a Record can be used where an Object is accepted and a Tuple where an Array is accepted. This choice has consequences: for one, we already found out that this could bring the TS compiler to permit incorrect runtime code that tries to mutate a record (Issue description > Complementary work: strictReadonly > The Problem).

Previous mutability example ```ts interface Foo { bar: string; } function anythingCanHappenToFoo(aFoo: Foo) { // ... aFoo.bar = "I can change anything!"; // ... } anythingCanHappenToFoo({ bar: "hello" }); // Typechecks OK, Runtime OK anythingCanHappenToFoo(#{ bar: "hello" }); // Typechecks OK, Runtime TypeError anythingCanHappenToFoo(Object.freeze({ bar: "hello" })); // Typechecks OK, Runtime TypeError ```

In this scenario, TS could let you write code that would hit a runtime error with R&T. This is a trade-off and we understand if this could be problematic.

With this angle in mind, let me go back to the issues you raised and see if we can mitigate them or even solve them...


Intersection Tagging

The TS strawperson proposal is gradual: the tagging, record & readonly {...}, is potentially the underlying mechanism underpinning how TS models R&T but there could also be more sugared syntax for writing these types.

For interface maybe it is OK that they describe both records and objects without explicitly specifying that the interface only describes Records. This would align with our stated intent: generally APIs that receive information should accept both objects and records.

Interface extension

This was suggested to me by @nicolo-ribaudo. Today it errors, but we could imagine tagging via the extends mechanism:

interface Foo extends object { ... }
interface Bar extends record { ... }

This is only a proposition here, this might go against how the typechecker is designed today.

More syntax

Now if we want to go another way, it would be possible to add more syntax out there to mark a record interface:

interface Bar #{ ... }

A downside to this is that it the inverse wouldn't have dedicated syntax - there wouldn't be a way to say the interface describes Objects and not Records.

Narrowing Behaviour

This second part is more complex and is one that may require more research to help find the most appropriate solution. That said I do have some initial thoughts.

To me this problem is very similar to the mutability problem I described earlier: when is it ok to let people write unsafe code and have the compiler ignore it? Once we introduce R&T in TypeScript, it is true that code relying on {...} types being objects becomes less of an edge case. As you pointed out this hazard already exists with functions but is probably not widespread enough to be an issue. Records may make this issue more common.

The 'unsound' option: continue to model {} as an object type that primitives can assign to

Continue to treat '{...}' types as objects when narrowing using typeof v === "object". Also continue to allow primitives to be passed to {}, due to their object wrappers.

Pro: No breaking change. Existing code would continue to pass type-checking Con: Unsafe runtime: a record type could be passed to a function expecting {} | number and get into the number branch.

This essentially follows what TS currently does. Making no breaking changes to account for the introduction of the record type.

The danger this mode has already exists, as mentioned above, that functions can be passed to {} but will go into the typeof v !== "object" branch of code. Records would increase the probability of a user experiencing this danger.

There is some high level similarity to how readonly was introduced and is handled. Frozen readonly objects can be accepted as mutable object types, with the benefit of existing code working but this won't catch runtime errors.

Another related aspect is that {...} types are currently assignable to object, which would naturally also continue in this option. A risk here is that Records can not be used in object-specific APIs such as WeakSet.

The 'sound' option: lean away from {} being an object type

Narrowing using typeof v !== "object" would not remove {...} types from a union.

Pro: Less chance of a runtime error when primitives and functions are assigned to {}. Con: Breaking change, causing compile time errors in existing code.

Code would need to prefer checking if something is a primtive (typeof v === 'number) first and let the 'object/dictionary like' code path be the else instead of the other way around.

if (typeof opts === "string" || typeof opts === "number") {
    volume = typeof opts === "string" ? +opts : opts;
    hue = 0;
}
else {
    // In this branch, 'opts' can be a function, an object or a record.
    volume = opts.volume ?? 47;
    hue = opts.hue ?? 0;
}

This choice may be reasonable if typeof <expr> === "object" is comparatively less prevalent than any of the opposite checks, typeof <expr> === "string".

It would be interesting if we could measure how big of a breaking change this would be. I can verify this in our codebases here at Bloomberg and I can also look at a greater OSS project corpus on GitHub to gather some data.

Optionally taking this further would be to also make {} no longer assignable to object. This would catch cases where a non object is used in object-specific APIs such as Reflect and WeakMap.


Those were great issues to explore and we might see more issues like these pop up while exploring this approach. That is why I'm keen on establishing a mental framework about the tradeoffs we are willing to make: Do we want to accept the new primitives everywhere even if this could bring runtime errors? Do we prefer to develop the feature as "a new world" where producers and consumers of R&T need to opt-in before accepting/returning them?

We do have our preferences in the champion group but I would like to know what you and the TS team think about the tradeoff...

ljharb commented 2 years ago

{} is “object coercible”, which includes numbers, so i wouldn’t expect anything to hit the number branch.

acutmore commented 2 years ago

{} is “object coercible”, which includes numbers, so i wouldn’t expect anything to hit the number branch.

That isn't what happens today. 1 can be assigned to {} because, as you say, it is object coercible. But in other contexts {} is treated as object.

ljharb commented 2 years ago

ah, that's confusing, i assumed {} would include anything that can be assigned to {}.

shicks commented 2 years ago

@rricard Thanks for the thorough and thoughtful analysis.

I wanted to get a better understanding of the current behavior. I found that apparently it matters whether we're talking about {} or {...} (where "..." is some nonempty set of expected properties). In the case of the empty {}, the type checker is already very good at understanding that it could be any non-null value, so it does not remove the {} from the type even in the non-object branch: {} playground.

On the other hand, if you have anything inside the braces, it's a different story. I went with {toString: () => string} since that should be satisfied by basically any object or wrapper type. In this case, TypeScript does remove the interface type - so if that changed, then existing code would no longer narrow the same and we could see type errors: {...} playground.

In both cases, pay attention to the narrowed type in the else branches.

Overall, I prefer the sound option (as well as finding some way to stricten the readonly/mutability handling). I'd be interested to see how this plays out in Google's codebase. Is there a patch we could run through our global presubmit to see how much breaks?

rricard commented 2 years ago

Having a way to test both the stricter readonly and the stricter object assignability would indeed be super interesting. I have been looking at collecting data on typeof sites and it is difficult to just do that without the compiler insights. Testing out the potential breakages would be way more accurate.

acutmore commented 2 years ago

We (Bloomberg) now have some preliminary results for attempting to measure the narrowing typeof x !== "object" potential breaking change.

Approach:

  1. create a patched version of TypeScript 4.7.4 with the narrowing behavior modified
  2. run this across a large number of internal TypeScript projects that currently pass type-checking on 4.7.4
  3. count the number of new errors

Patches:

Both patches additionally add "record" | "tuple" to the return type of typeof x.

patch-simpler: https://github.com/bloomberg/TypeScript/tree/ac/tsc-4-7-record-type-spike-simple

The simpler version of the patch does not filter out object types in a typeof T !== "object" branch. This can include false positives e.g:

function getLengthOrSize(v: string | Set<unknown>): number {
  if (typeof v === "object") {
     return v.size; // OK v is Set
  } else {
     return v.length; // Now an error even though a record wouldn't be assignable to Set
  }
}

patch-complex: https://github.com/bloomberg/TypeScript/tree/ac/tsc-4-7-record-type-spike

This more complex patch attempts to filter out false positives by also checking that the type is compatible with a new internal EsRecord type which is implemented as:

type Primitive = null | undefined | boolean | number | bigint | symbol | string | esRecord;
type esRecord = { readonly [p: string]: Primitive };
function getLengthOrSize1(v: string | Set<unknown>): number {
  if (typeof v === "object") {
     return v.size; // OK: v is Set
  } else {
     return v.length; // OK: v is string. Set is not assignable to esRecord
  }
}

function getLengthOrSize2(v: string | { size: number }): number {
  if (typeof v === "object") {
     return v.size; // OK v is Set
  } else {
     return v.length; // Error: `{ size: number }` is assignable to esRecord
  }
}

This patch does add non-trivial complexity compared to the simpler patch. It also introduces some false negatives for when the compatibility is contravariant:

function getLengthOrSize3(v: string | { size: (number | (() => number)) }): number {
  if (typeof v === "object") {
     return typeof v.size === "function" ? v.size() : v.size;
  } else {
     // @ts-expect-error: should error but does not
     return v.length; // OK. v is string. { size: (number | (() => number)) } is not assignable to esRecord
  }
}
getLengthOrSize3(#{ size: 1 }); // is assignable to `{ size: (number | (() => number)) }`

This patch did also have a stack overflow on two of our projects which I haven't had a chance to investigate yet. So while its results may be interesting, they are possibly of less note that the simpler patch.

Results:

We ran this across *~700 internal TypeScript projects, with a total `.ts` line count of ~3M**.

Simple patch: 12 type errors across 7 projects Complex patch: 4 type errors across 3 projects + 2 projects tsc crashes

frigus02 commented 2 years ago

I tried both patches on Google's codebase. This ran ~400k projects. Here are the results:

Seems like a very small number of error overall. Also seems in line with what you found, @acutmore.

acutmore commented 2 years ago

thanks for trying out the patches @frigus02, that is much appreciated and really useful additional data.

"But I can have a closer look if necessary"

That's OK. I think we can either take the results of the simple-patch, or assume the worst-case for the complex-patch and consider the projects that crashed as ones that would have type-errored anyway. Thanks again!