Suggestion: deferred type inference

malibuzios commented 8 years ago

[Note: the main approach I presented here uses explicit notation requiring introducing an additional operator(s). An alternative approach that applies code analysis instead is also discussed in the comments. That was actually how the initial idea started but I decided to go with the more conservative, explicit approach because I was uncertain if it would be sufficiently effective or simple to be implemented in practice.]

I have many places in my code where I can't really benefit from type inference, but it just seems like there aren't really good reasons why that wouldn't be possible to achieve through a slightly more explicit syntax.

One example is when a variable is conditionally assigned:

let someVar; // Defaults to 'any' :(

if (..some condition..)
    someVar = [{x: 1, y: -1}, {x: -2, y: -4}];
else
    somevar = [{x: -2, y: -3}, {x: 1, y: 3}];

Or assigned within a try..catch..finally block

let someVar; // Defaults to 'any' :(

try {
    someVar = [{x: 1, y: -1}, {x: -2, y: -4}];
    ...
}
catch (e) {
    somevar = [{x: -2, y: -3}, {x: 1, y: 3}];
    ..
}
finally {
    ..
}

The compiler can't infer the type of someVar because the assignment did not happen immediately at the same let statement.

This could be worked around by using var instead (though I personally try to avoid it as much as I can):

if (..some condition..)
    var someVar = [{x: 1, y: -1}, {x: -2, y: -4}]; // Inferred as { x: number, y: number }[]
else
    somevar = [{x: -2, y: -3}, {x: 1, y: 3}];

try {
    var someVar = [{x: 1, y: -1}, {x: -2, y: -4}]; // Inferred as { x: number, y: number }[]
    ...
}
catch {
    somevar = [{x: -2, y: -3}, {x: 1, y: 3}];
    ..
}
finally {
    ..
}

However there are cases where even var cannot help:

let someVar; // Defaults to 'any' :(

callAFunctionWithCallback(() => {
    someVar = ["HI", "THERE"];
});

let intermediateValue; // Defaults to 'any' :(

startAPromise()
  .then(() => { ... intermediateValue = [5, 4, 3];  ... return anotherPromise() })
  .then((result) => { ... use intermediateValue... })
...

Wouldn't it be great if there was a way to indicate that a variable should recieve the inferred type for a particular assignment?

let someVar; // Inferred as '{ x: number, y: number }[]' :)

if (..some condition..)
    someVar <>= [{x: 1, y: -1}, {x: -2, y: -4}];
else
    somevar = [{x: -2, y: -3}, {x: 1, y: 3}];

let someVar; // Inferred as { x: number, y: number }[]

try {
    someVar <>= [{x: 1, y: -1}, {x: -2, y: -4}];  
    ...
}
catch {
    somevar = [{x: -2, y: -3}, {x: 1, y: 3}];
    ..
}
finally {
    ..
}

let someVar; // Inferred as '{ x: number, y: number }[]' :)

if (..some condition..)
    someVar <>= [{x: 1, y: -1}, {x: -2, y: -4}];
else
    somevar = [{x: -2, y: -3}, {x: 1, y: 3}];

let someVar; // Inferred as 'string[]' :)

callAFunctionWithCallback(() => {
    someVar <>= ["HI", "THERE"];
});

let intermediateValue; // Inferred as 'number[]' :)

startAPromise()
  .then(() => { ... intermediateValue <>= [5, 4, 3];  ... return anotherPromise() })
  .then((result) => { ... use intermediateValue... })
...

This may also prove useful in class constructors, where there is no general way to benefit from type inference when assigning to class member properties:

function processInput(input: number, options: OptionsType): {x: number, y: number}[] {
    ...
}
...

class SomeClass {
    memberProperty; // Inferred as {x: number, y: number}[]

    constructor(input: number, options: OptionsType) {
        this.memberProperty <>= processInput(input, options);
    }
}

For cases where the inferred type should be a union, a related operator can be introduced to "partially" infer each component of the union on separate assignments:

let someVar; // The inferred type here is 'number | string'

if (..some condition..)
    someVar <|>= 42;
else
    somevar <|>= "HI THERE!";

This special operator may also be helpful with nullable and optional types, and may even somewhat reduce the need for a specialized shorthand notation for Nullable<T> in some scenarios:

let someVar; // The inferred type here is 'number | null'

if (..some condition..)
    someVar <|>= 42;
else
    somevar <|>= null;

Some properties and restrictions for the operators

The cast and assign operator (<>=) can only be used once for a particular variable:

let someVar;

someVar <>= 42;
someVar <>= "HI"; // Error..

Even if the inferred types are similar:

let someVar;
someVar <>= 42;
someVar <>= 43; // Error..

The operator is only applicable to untyped variables. Other types would error:

let x: number;
x <>= [34,45,12] // Error: deferred type inference is only available for untyped variables

// This variable is _explictly_ typed as 'any' so this would not work:
let y: any; 
y <>= [34,45,12] // Error

The full assignment operator (<>=) and partial one for unions (<|>=) cannot be mixed together:

let someVar;

someVar <>= "HI";
someVar <|>= 42; // Error..

Scoping rules

The scoping rules for both of these operators are open to discussion:

Targets that seem reasonable to me:

Local variables
Variables referenced through closures
Class member properties referenced from constructors

Targets that don't seem very reasonable to me:

Function and method parameters referenced from anywhere, including the function body
Class member properties referenced from class instances
Class member properties referenced from derived classes
Static class member properties referenced from instance methods / getters / setters
Variables referenced from an anonymous class through a closure
Variables referenced through closures where the received type is, or is dependent on an out-of-scope generic parameter, either through a generic function or anonymous class (the problem here is that in most cases it would either be a mistake or the type would simply default to any, which wouldn't be very useful)

Targets that are open to discussion:

Class instance member properties referenced from class instance methods/getters/setters
Static class member properties referenced from static class methods/getters/setters
Relationship with noImplicitAny

Having this alongside noImplicitAny would simply mean that every untyped variable or class member property must receive a type, and would never default to any if its type is not explicitly set by either its declaration or by using an implicit or explicit type inference:

let x; // Error: no implicit any
x = 2;

let x; // OK, type inferred as 'number'
x <>= 2;

Some open questions

Would you find something like this useful, and perhaps use it yourself?
Do you feel this encourages a generally "healthy" style of programming, or at least does not discourage it?
Should the scopes become even more limited than the ones I recommended? or alternatively should be extended?
Can the <>= and <|>= operators (AKA the cast and assign operators) be made available for use by ECMAScript? Do they seem intuitive enough? Do you have any ideas for a better notation?
Would you prefer the Pascal assignment notation over the one I used in the proposal? e.g. x := 42 and perhaps x := <|> 42 for unions? Is it ever likely to receive clearance from ECMAScript?
Would you prefer a more automated approach based on code analysis rather than explicit notation? How difficult is that to implement in practice? Can that approach be applied to somehow help with inferring union types or an explicit notation is still needed there?
Is there enough motivation from the TypeScript team to add the feature described or a similar one (modified or extended to their liking) in the reasonably near future, i.e. up to about a year or so from today? (assuming I don't have the time or sufficient knowledge to implement it myself?)

tinganho commented 8 years ago

I have some cases that I want to infer instead of writing the type ahead. Large data structures that cannot be assigned directly could really benefit from deferred inference.

Though I think <>= looks a bit weird to be honest.

How about just a deferred keyword?

let x: deferred;
callAFunctionWithCallback(() => {
    x = ["HI", "THERE"];
});

malibuzios commented 8 years ago

@tinganho

My initial idea was somewhat similar to the one you mentioned (I think the notation I first considered was x: <any> which in retrospect wasn't very good). The problem I perceived with the "just associate the type of the vertically first value assigned" approach is that there is no explicit control over where that happens.

In the case of a closure there isn't really a way to know what is the actual order of execution that would take place, and if the first assignment happens to be a mistake, and that code may not even execute in practice, then the mistaken type would erroneously be associated with the variable.

let x: <Deferred>;

let func1 = () => {
   x = ["HI", "THERE"]; // Was string[] intended here?
}

let func2 = () => {
   x = [4, 2]; // Or number[]?
}
..

Or alternatively, if the vertically second assignment was erroneous, but the first one was removed, the compiler would infer a type that was not intended by the programmer, and the resulting error messages may be a bit strange (the compiler would claim it expects a type that was never even intended by the programmer as valid for the variable).

If there is no assignment at all, it is also not clear what type the variable would receive (in the case of untyped let x it is either any or an error when noImplicitAny is enabled)

Another issue is that it cannot be naturally extended to the union notation, which has to be very explicit:

let x;

if (..someCondition..)
  x <|>= 34;
else
  x <|>= "HI";

It also has the advantage that it naturally extends from untyped variables, which makes conversion from plain Javascript easier, and more natural for people who are used to it.

It also requires less typing and somewhat cleaner.

In terms of notation, a different one I considered was the empty cast:

let x;

x = <> 42;

Which I don't think is really that different aesthetically, and would personally prefer the one suggested here.

Ideas for a better notation would be highly appreciated!

And if most people believe the "vertically first" approach wasn't that bad and would prefer it, I have no problem with that, but it appears to be less easy to work out to a good proposal and there seem to be more issues needed to be tackled with it. That's why I believed the more "explicit" approach would seem more acceptable (though I might have been wrong?)

malibuzios commented 8 years ago

@tinganho

The "cast and assign" notation presents the analogy of both casting to the type (the <> part), and and assigning the value (the = part). I'm aware it looks a bit like something from some esoteric or academic programming language, so there definitely might be better ones..

For example, a less "strange" notation that came to mind was Pascal style assignment notation, e.g.:

let x;

x := [234,235];

and perhaps

let x;

if (...)
  x := <|> 64;
else
  x := <|> "HELLO";

for unions.. (or something like that.. there might be a "prettier" one for this..)

It looks simple and "non-threatening", and I would have probably preferred it instead. The problem was that I had very strong doubts it would accept clearance from ECMAScript..

jods4 commented 8 years ago

This doesn't feel very Typescript-like. The philosophy of TS so far has been to consciously limit inference for two reasons:

compiler performance;
catch more errors by forcing the developer to declare its intent rather than infer everything from usage.

Inferring as much as possible from code by assuming everything is correct feels more like the philosophy of the Flow checker.

Introducing new syntax in the language is a drawback that needs to pay for itself with benefits. In this case I'm really not sure it's worth it.

malibuzios commented 8 years ago

@jods4

Thanks for sharing your opinion, I don't think I would have spent the time to write this if I felt the same. I did not feel this is somehow inappropriate for TypeScript, at all. It is not my job to analyze compiler performance and I did not think there's very complicated analysis needed here, compared to say, highly nested generic expressions and constraints etc. It does require an additional operator and I did my best to find one that I felt was reasonable. I would have preferred if it was possible to do without one but couldn't find an adequate solution (it does not mean that doesn't exists).

I don't think that the philosophy of TypeScript is to force the developer to write types in full. To the contrary, I think there has been a tremendous effort that's been put to infer whenever possible and reduce the amount of places that programmers would need to spell types (which at many cases become very complex). There are hundreds of feature requests here for very small improvements in type inference, this includes some extremely intricate patterns, many of them are significantly more complicated than anything mentioned here and not very common in practice.

Anyway, perhaps you have an alternative suggestion on how to make better use of type inference for these particular scenarios? because I keep finding myself spending needless time in writing out types in full here, and maintaining them, especially when using promises and callbacks:

let someVar;

callAFunctionWithCallback(() => {
    someVar = ["HI", "THERE"];
});

let intermediateValue;

startAPromise()
  .then(() => { ... intermediateValue = [5, 4, 3] .. })
  .then(() => { ... use intermediateValue ... })
...

I also avoid entirely using var due to its scoping rules so this scenario is very common in my code:

let someVar;

if (..some condition..)
    someVar = [{x: 1, y: -1}, {x: -2, y: -4}];
else
    somevar = [{x: -2, y: -3}, {x: 1, y: 3}];

And there are class constructors:

function processInput(input: number, options: OptionsType): {x: number, y: number}[] {
    ...
}
...

class SomeClass {
    memberProperty;

    constructor(input: number, options: OptionsType) {
        this.memberProperty = processInput(input, options);
    }
}

RyanCavanaugh commented 8 years ago

I think this is something we'll want to look at once the flow control analysis work is in. It may be reasonable to infer the type of a variable from its first assignment (or set of equally-reached assignments) rather than only its initializer.

malibuzios commented 8 years ago

@RyanCavanaugh

As I mentioned in a previous comment my original approach was a "first vertically seen assignment" (note this may include closures, so it isn't really just "first assignment") one but I felt that was not adequate enough to propose and did not believe it would get a positive response. One of the main reasons was error messages, here's an example:

Let's say that there are several assignments to a variable and one of them is for an incompatible type and then the value is returned from a function which is then passed to another function:

function funcThatWantsAString(str: string) {
  ..
}

function func() {
  let x;
  x = 42;
  x = "OK"
  return x;
}

funcThatWantsAString(func());

Now let say the first assignment is there by mistake but the second one is correct, the compiler would assume that x should be a number and func in turn returns a number. The fact is, the code is completely valid (despite the inconsistent types, I mean) and runs perfectly fine. One of the error messages here would be that funcThatWantsAString wants a string as first parameter, not a number, and this may leave the programmer a bit puzzled..

(I could imagine ways to mitigate that but that would require some careful ordering of the compilation process, for example avoiding inferring the function return type if there is a type conflict of this sort inside of it - this may be possible but I'm not in a position to know).

I could imagine this may be even worse than this, there could even be cases where the inferred type is taken from a function that references the variable from a closure, however, that function isn't even executed in practice:

let x;

function funcThatIsntExectuted() {
  x = 42;
}

function func {
  x = "HI";
}

So now the compiler inferred a type based on something that's invalid and not even executed, so the error message may be a bit confusing as well..

It could be that my concerns was exaggerated, but these are things that could happen, especially when converting from plain Javascript code (which may execute just fine in practice).

jods4 commented 8 years ago

@malibuzios

perhaps you have an alternative suggestion on how to make better use of type inference for these particular scenarios?

Let me try, not sure if I'll succeed :wink:

Before that: I don't think declaring your types is that bad. It helps catch mistakes and once you have them you can put type annotations in cases where required + code completion will work for you + it will prevent some typos that might get "inferred" + it might help readers of your code understand immediately what a variable is about. Coders in other static languages (C#, Java, etc) have had to declare all their types for a long time and somehow it worked for them (not saying we shouldn't try to improve).

Let's look at your examples

// #1
// I can't help much on this one, because "as is" the pattern looks bad.
// Mutating global state like that is bad but I can't suggest much
// because you didn't provide context as to what this is supposed to do.
// If callAFunctionWithCallback is synchronous it should rather return a value, 
// if it's async you should really use Promise, or async rather than callback.
// In the async case this code is fragile because it's not re-entrant.
let someVar : string[];
callAFunctionWithCallback(() => {
    someVar = ["HI", "THERE"];
});

// #2
// You can return one promise result to the next one.
// Use await to make your async code awesome.
startAPromise()
  .then(() => { ... return [5, 4, 3]; })
  .then(result => { ... use result (inferred number[]) ... });

async function() {
  await startAPromise();
  let intermediateValue = [5, 4, 3];
  await whatever();
  // use intermediateValue
}

// #3
// (Typical example where an interface will make typing easier thanks to completion)
// I would certainly have some Point interface if I had to manipulate points all over the code.
let someVar = some_short_condition ? 
 [{x: 1, y: -1}, {x: -2, y: -4}] :
 [{x: -2, y: -3}, {x: 1, y: 3}];

let someVar = hideComplexCodeInFunction();
function hideComplexCodeInFunction() {
  if (some_condition)
    return [{x: 1, y: -1}, {x: -2, y: -4}];
  else
    return [{x: -2, y: -3}, {x: 1, y: 3}];
}

// #4
// I think it's probably better if you declare the structure of your class explicitely...
// If you use some structure accross functions and in public contracts (return values, fields)
// you should seriously consider naming it.
// Anyway, no other solution in my opinion for typing a class field which is computed from ctor parameters.

class SomeClass {
    memberProperty: Point[];
    constructor(input: number, options: OptionsType) {
        this.memberProperty = processInput(input, options);
    }
}

Inferring your last example is probably going too far. Consider that if someone changes the return type of processInput, it has consequences all over your code by changing the type of the field memberProperty. This could even be accross different libraries!!

malibuzios commented 8 years ago

@jods4

Thanks for putting the time at trying to suggest workarounds, but these scenarios, although simplified are based on real-world code where using these variables in those particular positions was deemed to be the best choice.

1 - This is a very simplified callback or closure scenario. Yes it is very basic, sure using async or a promise is better but not always possible.

2 - The intermediate value here is not necessarily something that is possible to return from the promise. A lot of the time the promise would return another promise and that intermediate value is not related. e.g.:

let intermediateValue;

startAPromise()
  .then(() => { ... intermediateValue = [5, 4, 3]; return anotherPromise()})
  .then(result => { ... use intermediateValue ... });

Yes I wish I could use await everywhere, and I guess I will use it more (more like "at all") when downlevel emit for ES5 is available. But there would always be cases where its not possible or it wouldn't be perfomant enough

3 - I don't use the ? : _ because of readability, and in many cases there are more than two options or it is nested, I simplified the example to consume less space.

The kind of code I write commonly requires good performance so putting it in a function may have some negative effect. It also looks a bit unnecessary and adds some noise complexity.

4 - It is already possible to call a function and infer that type into a class property, the limiting aspect about it is that it cannot accept parameters like a full constructor:

class SomeClass {
    memberProperty = func();
}

I don't think that is such a bad pattern, anyway it exists in the language so I guess they thought it was good enough. If the type returned by func changes, sure, it will have implications, but that's not that different from other places where inference is used.

Sure there are patterns to cleverly avoid needing to state types, or minimize them. The thing is, one of the major and central ones is type inference. I don't see anything fundamentally wrong to extend it to cover more scenarios.

jods4 commented 8 years ago

@malibuzios Nothing wrong with wanting powerful inference, just challenging a little bit ;)

I think there is a limit on inference, though. If you start inferring too much by always assuming the code is right, then you start to loose the benefits of static typing.

For instance consider this, which is still rather simple and supported in TS 1.8:

let array = [ { x: 1, y: 2 }, { x: 3, /* typo, should be y*/ w: 4 }];
// array: {x: number, y: number}|{x: number, z: number }[]

This might get caught by further usage, if you're unlucky it might not.

malibuzios commented 8 years ago

@RyanCavanaugh

I've reconsidered the examples I've given you, which I thought may lead to unhelpful or puzzling error messages. There's a very simple solution to this. First I'll try to construct a (rather contrived) scenario that better exemplifies the problem, which is rather like a 'hybrid' of the two previous examples I gave:

let x;

function thisFunctionIsNeverCalled() {
    x = 42;
}

function thisFunctionIsActuallyCalled() {
    x = "HI";
}

function thisFunctionWantsAString(arg: string) {
    ...
} 

thisFunctionIsActuallyCalled();
thisFunctionWantsAString(x);

(Edit: for simplicity I've ignored possible issues with the non-nullability of x. With stricter non-nullability constraints thisFunctionWantsAString may need to be notated as accepting arg?: string, but that's not really the main topic here)

I said that the problem may be that one of the error messages may look something like:

Error: In 'thisFunctionWantsAString', 'arg' wants a 'string' but got a 'number' instead

One simple thing that can be done to avoid this is to instead infer x to have type any (not because that would be useful, but only to avoid outputting further confusing errors) and only give the error:

Error: Cannot infer a type for 'x': incompatible types assigned

(or something more detailed than that perhaps..)

Where the 'squiggles' would highlight both assignments equally, without giving any precedence to any one of them.

This may be seen as somewhat of a breaking change (or at least an increase of the 'strictness' of the compiler) because some implicit 'any' types, that weren't notated but the variable was actually intended to accept multiple types (this may include unions) would now error and would need to be explicitly set as any (or a more specific union type), though withnoImplicitAny enabled this doesn't seem like it would be an issue.

malibuzios commented 8 years ago

@RyanCavanaugh

A somewhat similar approach to what I described seems to be already used for return type inference:

function unpredictableReturnType(a: number, b: number) {
    if (a > b)
        return 1234;
    else
        return "abcd";
}

Infers the return type as number | string but gives the compile-time error:

No best common type exists among return expressions.

(though I'm not sure the intention here was to infer 'lower' common types, so this isn't exactly the same)

Also, the kind of inference described here could be enabled by a compile-time switch that would lie somewhere on the spectrum between 'regular' mode and noImplicitAny mode. Something along the lines of noImplicitVariantTypes.

Edit: or alternatively enable it by default (breaking change) and introduce a switch that disables it (e.g. allowImplicitVariantTypes).

I would have personally enabled something like this immediately if it was available and I don't think the amount of superfluous 'errors' it would catch would be very large. noImplicitAny by contrast is very strict and at times 'nitpicky' and seems to be difficult to 'satisfy' in large code bases (up to hundreds of errors at times).

saschanaz commented 7 years ago

This seems (at least partially) covered on TS 2.1: https://blogs.msdn.microsoft.com/typescript/2016/11/08/typescript-2-1-rc-better-inference-async-functions-and-more/

RyanCavanaugh commented 4 years ago

We've implemented the let initialization to its most reasonable degree and return type inference is less strict now, so I'm calling this one done.

microsoft / TypeScript