microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.97k stars 12.48k forks source link

Multimatch Overloads and a New User Contract for Overload Declarations. #57004

Open craigphicks opened 10 months ago

craigphicks commented 10 months ago

πŸ” Search Terms

βœ… Viability Checklist

⭐ Suggestion

Note: the terms cover, convex, concave, and gap are defined in the Definitions section at the end of this proposal.

Existing overload matching algorithm - Single Match

The current overload matching algorithm selects a single overload, or fails, as follows:

pseudo-code:

match = undefined;
for each overload in overload sequence
    if exact match
        matches = overload;
        break
if (!match) compilerError();

Proposed overload matching algorithm (defective version) - Multiple Matches

First, a defective version of the proposed algorithm is described here. Further down, a better version is described.

The (defective) proposed overload matching algorithm selects (possibly multiple) overloads as follows:

pseudo-code:

matches = []
for each overload in overload sequence
    if partial match
        matches.push(overload)
        if exact match
            break
if (matches.length===0) compilerError();

What are the consequences of this proposed change?

This table shows relations between the input argument range and compile errors, for the current (Single Match) and proposed (Multiple Matches) overload matching algorithms, without any other changes. This is for the general case where last overload might not be the cover.

Table of Compiler Errors for Input Argument Range vs Algorithm (with Defective Proposed Algorithm)* Input Argument Range Single Match Algo Multi Match Algo
1. Exact Match Exists No Error No Error
2. No Partial Match, Within Cover Error Error
3. Partial Match only, Within Cover Error No Error (*)
4. Any Part exceeds Cover Error Sometimes No Error (*)

First and foremost, the current (Single Match) algorithm is designed specifically so that any input overlapping the (non-empty) Gap of the overload sequence will fail to match any overload, and will trigger a compile error. This is a designed feature, not a bug. The idea is that no unhandled input can slip through resulting in unpredictable behavior. That's the current contract between the user and TypeScript, and it is a good contract for the user.

In contrast, in the proposed matching algorithm described so far, any input overlapping (case 3), or exceeding (case 4), the Gap of the overload sequence might not trigger a compile error. Unless other changes are made, the proposed algorithm will be a bad contract for the user. The next section describes changes to make it a good contract again.

Contract with User to Handle the Gap Case (User Runtime Contract)

Interface and Usage

Suppose an overload function declaration

function overloadFunction(...): ...; // overload 1
function overloadFunction(...): ...; // overload 2
function overloadFunction(a,b,c) { // implementation
    // ...
}

An new intrinsic function is proposed:

SetupOverload(
    functionSymbol: TypeScript Symbol // e.g. overloadFunction,
    gapReturnType: throws | never | any = never,
    thrownType: any = undefined
): void;

where

Note: The presence of throws [optional type] doesn't imply that full throw flow support is required. It is sufficient to display it as a return type as a form of documentation.

Note that SetupOverload does not fit the TypeScript syntax of a generic type function, because

  1. functionSymbol is not a type. functionSymbol is necessary to associate the type with the function symbol.
  2. It does not return a type. Instead, it associates the type with the symbol as a side effect. The type itself can be obtained through typeof overloadFunction.

Creating an overload type independently of a function declaration is [described later]().

The function SetupOverload will set up an overload type assiciate with the symbol for overloadFunction.

The overload type will provide the following public interface for TypeScript internal use only:

SetupOverload needs to be executed by the compiler before the implementation of overloadFunction undergoes flow analysis / type checking, where it will used to check return types (as much as possible).

The valid values for CoverReturnType and their meanings are as follows:

The value gapReturnType should be displayed as part of the overload type, so that clients of the user can see the user's contract.

Note about throws

Overload matching algorithm - Multiple Matches (non-defective version)

The full, non-defective, proposed Multi Match algorithm becomes as follows: pseudo-code:

matches = []
hasExactMatch = false
for each explicit overload in overload sequence
    if partial match
        if (overload return type !== throws and overload return type !== never)
            matches.push(overload)
        if exact match
            hasExactMatch = true
            break
if (!hasExactMatch)
    if not exact match for implicit cover overload
        compilerError();
    else if (CoverReturnType !== throws and CoverReturnType !== never)
        matches.push(implicit cover overload)
if (matches.length===0) compilerError();

The updated Table of Compiler Errors becomes:

Table of Compiler Errors for Input Argument Range vs Algorithm Input Argument Range Single Match Algo Multi Match Algo
1. Exact Match Exists No Error No Error
2. No Partial Match, Within Cover Error Error
3. Partial Match only, Within Cover Error User Runtime Contract
4. Any Part exceeds Cover Error Error

The difference between the cases is now narrowed to case 3. The use of "User Runtime Contract" is a tradeoff, which will be examined in the Examples section below.

Implicit final overload

Conceptually, the overload type has an implicit final overload in addition to the explicit overloads. That implicit overload has implicit parameter type of the Gap of the overload sequence parameters, because the Gap is all that is left after matching the explicit overloads.

The return type of the implicit overload is gapReturnType.

The parameters of the implicit overload are the Cover of the parameters of the explicit overloads. However the return type is only the gapReturnType.

Creating an Overload Type without a Function Declaration

It is also necessary to be able to create an overload without reference to a function declaration. This could be done with the following intrinsic function:

type CreateOverload<
    TupleOfFuncs extends [... ((...args:any[])=>any)[]],
    GapReturnType extends throws | never | any = never,
    ThrownType extends any = undefined
>; // intrinsic
// usage
const overloadFunction: CreateOverload<....> = (a,b,c) => { /* implementation */ }

πŸ“ƒ Motivating Example

Example 1: Overloads in a nested loop

Based on case study borrowed and expanded from @RyanCavanaugh.

TypeScript 5.3.2

/**
 * The implementation has final throw to cover gap
*/
// function fn(a: string|number, b: number|boolean, c: string|boolean): 1|2|3|4 {
//     const at = typeof a;
//     const bt = typeof b;
//     const ct = typeof c;
//     if (at === "string" && bt === "number" && ct === "boolean")
//         return 1;
//     if (at === "number" && bt === "number" && ct === "string")
//         return 2;
//     if (at === "string" && bt === "boolean" && ct === "boolean")
//         return 3;
//     throw "rangeError"
// }

declare function fn(a: string, b: number, c: boolean): 1;
declare function fn(a: number, b: number, c: string): 2;
declare function fn(a: string, b: boolean, c: boolean): 3;

declare const arr11: (string)[];
declare const arr12: (number | boolean)[];
declare const arr13: (string | boolean)[];
arr11.forEach(a => {
    arr12.forEach(b => {
        arr13.forEach(c => {
            const r = fn(a,b,c); // error
//                    ~~~~~~~~~
// No overload matches this call.
// Overload 1 of 3, '(a: string, b: number, c: boolean): 1', gave the following error.
//     Argument of type 'number | boolean' is not assignable to parameter of type 'number'.
//     Type 'boolean' is not assignable to type 'number'.
// Overload 2 of 3, '(a: number, b: number, c: string): 2', gave the following error.
//     Argument of type 'string' is not assignable to parameter of type 'number'.
// Overload 3 of 3, '(a: string, b: boolean, c: boolean): 3', gave the following error.
//     Argument of type 'number | boolean' is not assignable to parameter of type 'boolean'.
//     Type 'number' is not assignable to type 'boolean'.ts(2769)
        });
    });
});

Under this proposal - Proposal

declare function fn(a: string, b: number, c: boolean): 1;
declare function fn(a: number, b: number, c: string): 2;
declare function fn(a: string, b: boolean, c: boolean): 3;
declare setupOverload<typeof fn, throws, "rangeError">();

declare const arr11: (string)[];
declare const arr12: (number | boolean)[];
declare const arr13: (string | boolean)[];

arr11.forEach(a => {
    arr12.forEach(b => {
        arr13.forEach(c => {
            const r = fn(a,b,c); // 1 | 3 | throws "rangeError"
        });
    });
});

To be fair, TypeScript 5.3.2 overloads could be written to capture all the possible valid inputs, but it requires considering all these cases:

declare function fn(a: string, b: boolean, c: string): never;
declare function fn(a: string, b: boolean, c: boolean): 3;
declare function fn(a: string, b: boolean, c: string|boolean): never;
declare function fn(a: string, b: number, c: string): never;
declare function fn(a: string, b: number, c: boolean): 1;
declare function fn(a: string, b: number, c: string|boolean): 1;
declare function fn(a: string, b: number|boolean, c: string): never;
declare function fn(a: string, b: number|boolean, c: boolean): 1|3;
declare function fn(a: string, b: number|boolean, c: string|boolean): 1|3;
...
...
...
...
...
...
...
... // 27 cases
...
...
...
...
...
...
...
...
declare function fn(a: string|number, b: number|boolean, c: boolean): 1|2|3;
declare function fn(a: string|number, b: number|boolean, c: string|boolean): 1|2|3;

So it's not a good solution. Not to mention, even after removing the returns with value never, the number of matches for the compiler to perform is prohibitive.

Another 5.3.2 solution is to recreate the implementation inside the loop, which is redundant and error prone:

declare function fn(a: string, b: number, c: boolean): 1;
declare function fn(a: number, b: number, c: string): 2;
declare function fn(a: string, b: boolean, c: boolean): 3;
declare function fn(a: string|number, b: number|boolean, c: boolean|string): 1|2|3; // | throws "rangeError"

declare const arr11: (string)[];
declare const arr12: (number | boolean)[];
declare const arr13: (string | boolean)[];

arr11.forEach(a => {
    arr12.forEach(b => {
        arr13.forEach(c => {
            const r = (() => {
                if (typeof a === "string" && typeof b === "number" && typeof c === "boolean")
                    return fn(a,b,c);
                if (typeof a === "number" && typeof b === "number" && typeof c === "string")
                    return fn(a,b,c);
                if (typeof a === "string" && typeof b === "boolean" && typeof c === "boolean")
                    return fn(a,b,c);
                return fn(a,b,c); // if reached would throw "rangeError" (a.k.a. never)
            })();
        });
    });
});

so that's not a good solution either.

Example 2: Concave Overload Sequence

Borrowed and extends from TypeScript documentation on Overloads:

With just the two overloads it is a Concave Overload Sequence. TypeScript 5.3.2

declare function makeDate(timestamp: number): Date;
declare function makeDate(m: number, d: number, y: number): Date;

const mOrTimestamp:number = getMOrTimestamp();
const let d:number|undefined = getD();
const let y:number|undefined = getY();
makeDate(mOrTimestamp,d,y); // error on parameter 'd'
//                    ~ error on d

so we add two more overloads to make it a Convex Overload Sequence.

TypeScript 5.3.2

declare function makeDate(timestamp: number): Date;
declare function makeDate(m: number, d: number, y: number): Date;
declare function makeDate(mOrTimestamp: number, d?: number, y?: number): undefined; // gap case
declare function makeDate(mOrTimestamp: number, d?: number, y?: number): Date | undefined; // cover used for typeof makeDate

const mOrTimestamp:number = getMOrTimestamp();
const let d:number|undefined = getD();
const let y:number|undefined = getY();
const date = makeDate(mOrTimestamp,d,y); // Date | undefined

The proposal allows a more simple declaration:

declare function makeDate(timestamp: number): Date;
declare function makeDate(m: number, d: number, y: number): Date;
// - declare function makeDate(mOrTimestamp: number, d?: number, y?: number): undefined; // gap case
// - declare function makeDate(mOrTimestamp: number, d?: number, y?: number): Date | undefined; // cover used for typeof makeDate
declare SetupOverload<typeof makeDate, undefined, undefined>();
Example 3: Capturing input correlation with return type

For some overload functions with simple return types, the reverse function mapping allows a flow-analysis functionality similar to that that const variables has with respect to flow analysis - they can capture the correlation between multiple variables.

declare const a:1|2|3;
declare const b:1|2|3;
declare const c:1|2|3;

const correlationCapture: CreateOverload<
    [
        (a: 1, b: 1, c: 1) => "a",
        (a: 2, b: 2, c: 1|2) => "b",
        (a: 3, b: 3, c: 1|2|3) => "c"
    ],
    throws
> = (a,b,c) => {
    if (a===1 && b===1 && c===1) return "a";
    if (a===2 && b===2 && (c===1 || c===2)) return "b";
    if (a===3 && b===3 && (c===1 || c===2 || c===3)) return "c";
    throw undefined;
}

const r = correlationCapture(a,b,c); // "a"|"b"|"c"|throws undefined
// demultiplexing
if (r==="a") {
    // a:1, b:1, c:1
} else if (r==="b") {
    // a:2, b:2, c:1|2
} else if (r==="c") {
    // a:3, b:3, c:1|2|3
}

πŸ’» Use Cases

  1. As in example 1, there are some cases where single-exact-match overloads cannot effectively capture the range of valid combinations of types. Partial matching solves that.

  2. In many cases providing the gap overload parameters and final cover are tedious because that require accurately writing a complicated type, which is why it is often skipped, as in example 2. The advantages of using setupOverload are

    • ease of use compared to writing the cover manually
    • the user is forced to consider the gap case, and to make a decision about how to handle it - that's the contract.
    • having the correct Cover including the gap type can make it easier for the compiler to do flow analysis and inference (to be discussed in another proposal).
  3. As shown in example 3, the reverse function mapping allows a flow-analysis functionality similar to the one that const variables has with respect to flow analysis - they can capture the correlation between multiple variables. This may be useful in some cases.

More details required

Implementing the new overload type so that is coexists with old overload type

So that it wouldn't be a breaking change.

Inference, unions, and intersections of the proposed overloads

There are implications for inference, unions, and intersections of the proposed overloads which need to be detailed.

Definitions

Definition of Cover of Overloads

Suppose a sequence of overloads for a function f is defined as follows:

declare function f(...args: P1):R1
declare function f(...args: P2):R2
...
declare function f(...args: PN):RN

The cover of the sequence of overloads is a unique type, and that type could be defined canonically as follows:

type MyCoverOfF = (...args: P1 | P2 | ... | PN): R1 | R2 | ... | RN;

Notice the cover is formed by taking the union of types over each parameter (and return type).

psuedo-code:

Parameters<MyCoverOfF>[n] === Union over all nth parameters of the overloads
ReturnType<MyCoverOfF> === Union over all return types of the overloads

Definition of Convex vs Concave Overloads, the Gap.

"Concave" and "convex" are terms borrowed from sets theory and geometry, where here the parameters and return type are the spaces axes. "Gap", on the other hand, is a shorthand defined in this explanation only.

Here is an example of a convex sequence of overloads (borrowed from the TypeScript documentation):

declare function makeDate(timestamp: number): Date;
declare function makeDate(m: number, d: number, y: number): Date;

The type of the cover of the above sequence of overloads is unique. It could be defined as follows:

(mOrTimestamp: number, d?: number, y?: number) => Date;

which is equivalent to the canonical definition above:

(...args: [number]|[number, number, number]) => Date

The input ...[1,1] is in the gap of the above sequence of overloads.

makeData(1,1) // error

A more general example

declare const a:number;
declare const b:number|undefined;
declare const c:number|undefined;
makeDate(a,b,c) // error on parameter 'b'
//         ~
// Argument of type 'number | undefined' is not assignable to parameter of type 'number'.
//   Type 'undefined' is not assignable to type 'number'.(2345)
// const b: number | undefined

In both case, the TypeScript compiler kindly informs us that the input overlaps the gap of the sequence of overloads.

jcalz commented 10 months ago

Cross-linking to #14107

RyanCavanaugh commented 9 months ago

"convex" is actually super-useful terminology πŸ‘

fatcerberus commented 9 months ago
  • A sequence of overloads is convex if and only if the every possible allowed input of its cover is also matched by the overloads.
  • A sequence of overloads is convex if and only if there exists some input that cannot be matched by any overload.

These seem contradictory. Did you mean to write concave in the third bullet point? Otherwise I can only interpret this combination of statements non-contradictorily as "accepts all inputs (i.e. ...args: any[]) = concave" which doesn't seem like a useful definition.

craigphicks commented 9 months ago

It was wrong, and also it didn't make clear that convex here is referring to convex relative to the cover.

Actually, in multi-matching the basis parameters are allowed to be combined, and the space those combinations span is bigger than the basis vectors alone - hence unions types give get the proper input-output mapping returns. That spanned space is itself a kind of "concave", but still it can be "convex" relative to the "cover". I have to add that to the documentation.