RyanCavanaugh commented 8 years ago

Some rough notes from a conversation @ahejlsberg and I had earlier about trade-offs in the control flow analysis work based on running the real-world code (RWC) tests. For obvious reasons I'll be comparing what Flow does with similar examples to compare and contrast possible outcomes.

The primary question is: When a function is invoked, what should we assume its side effects are?

One option is to be pessimistic and reset all narrowings, assuming that any function might mutate any object it could possibly get its hands on. Another option is to be optimistic and assume the function doesn't modify any state. Both of these seem to be bad.

This problem spans both locals (which might be subject to some "closed over or not" analysis) and object fields.

Optimistic: Bad behavior on locals

The TypeScript compiler has code like this:

enum Token { Alpha, Beta, Gamma }
let token = Token.Alpha;
function nextToken() {
    token = Token.Beta;
}
function maybeNextToken() {
    if (... something ...) {
        nextToken();
    }
}

function doSomething() {
    if (token !== Token.Alpha) {
        maybeNextToken();
    }
    // is this possible?
    if (token === Token.Alpha) {
        // something happens
    }
}

Optimistically assuming token isn't modified by maybeNextToken incorrectly flags token === Token.Alpha as an impossibility. However, in other cases, this is a good check to do! See later examples.

Optimistic: Bad behavior on fields

The RWC suite picked up a "bug" that looked like this:

// Function somewhere else
declare function tryDoSomething(x: string, result: { success: boolean; value: number; }): void;

function myFunc(x: string) {
    let result = { success: false, value: 0 };

    tryDoSomething(x, result);
    if (result.success === true) { // %%
        return result.value;
    }

    tryDoSomething(x.trim(), result);
    if (result.success === true) { // ??
        return result.value;
    }
    return -1;
}

The ?? line here is not a bug in the user code, but we thought it was, because after the %% block runs, the only remaining value in result.success's domain is false.

Pessimistic: Bad behavior on locals

We found actual bugs (several!) in partner code that looked like this:

enum Kind { Good, Bad, Ugly }
let kind: Kind = ...;
function f() {
    if (kind) {
        log('Doing some work');
        switch (kind) {
            case Kind.Good:
                // unreachable!
        }
    }
}

Here, we detected the bug that Kind.Good (which has the falsy value 0) is not in the domain of kind at the point of the case label. However, if we were fully pessimistic, we couldn't know that the global function log doesn't modify the global variable kind, thus incorrectly allowing this broken code.

Pessimistic: Bad behavior on fields, example 1

A question on flowtype SO is a good example of this

A smaller example that demonstrates the behavior:

function fn(arg: { x: string | null }) {
    if (arg.x !== null) {
        alert('All is OK!');
        // Flow: Not OK, arg.x could be null
        console.log(arg.x.substr(3));
    }
}

The problem here is that, pessimistically, something like this might be happening:

let a = { x: 'ok' };
function alert() {
    a.x = null;
}
fn(a);

Pessimistic: Bad behavior on fields, example 2

The TS compiler has code that looks like this (simplified):

function visitChildren(node: Node, visit: (node: Node) => void) {
    switch(node.kind) {
        case SyntaxKind.BinaryExpression:
            visit(node.left);
            visit(node.right); // Unsafe?
            break;
        case SyntaxKind.PropertyAccessExpression:
            visit(node.expr);
            visit(node.name); // Unsafe?
            break;
    }
}

Here, we discriminated the Node union type by its kind. A pessimistic behavior would say that the second invocations are unsafe, because the call to visit may have mutated node.kind through a secondary reference and invalidated the discrimination.

Mitigating with (shallow) inlining / analysis

Flow does some assignment analysis to improve the quality of these errors, but it's obviously short of a full inlining solution, which wouldn't be even remotely practical. Some examples of how to defeat the analysis:

// Non-null assignment can still trigger null warnings
function fn(x: string | null) {
    function check1() {
        x = 'still OK';
    }

    if (x !== null) {
        check1();
        // Flow: Error, x could be null
        console.log(x.substr(0));
    }
}

// Inlining is only one level deep
function fn(x: string | null) {
    function check1() {
        check2();
    }
    function check2() {
        x = null;
    }

    if (x !== null) {
        check1();
        // Flow: No error
        console.log(x.substr(0)); // crashes
    }
}

Mitigating with `const` parameters

A low-hanging piece of fruit is to allow a const modifier on parameters. This would allow a much faster fix for code that looks like this:

function fn(const x: string | number) {
  if (typeof x === 'string') {
    thisFunctionCannotMutateX();
    x.substr(0); // ok
  }
}

Mitigating with `readonly` fields

The visitChildren example above might be mitigated by saying that readonly fields retain their narrowing effects even in the presence of intervening function calls. This is technically unsound as you may have both a readonly and non-readonly alias to the same property, but in practice this is probably very rare.

Mitigating with other options

Random ideas that got thrown out (will add to this list) but are probably bad?

pure modifier on functions that says this function doesn't modify anything. This is a bit impractical as we'd realistically want this on the vast majority of all functions, and it doesn't really solve the problem since lots of functions only modify one thing so you'd really want to say "pure except for m"
volatile property modifier that says this "this property will change without notice". We're not C++ and it's perhaps unclear where you'd apply this and where you wouldn't.

zpdDG4gta8XKpMCd commented 8 years ago

pure is a bit impractical

with persistent data structures it makes perfect practical sense, amortized time/memory consumption for a typical application would be comparable, it take's some discipline and support from the language ( https://github.com/Microsoft/TypeScript/issues/6230, https://github.com/Microsoft/TypeScript/issues/2319, https://github.com/Microsoft/TypeScript/issues/1213), but it's worth the trouble

with readonly for immutables ans pure for functions it will be blazing fast compared to inlining, because modifiers only need to be checked once for each function/interface

please no more half measures based on assumptions for a couple of typical cases and endless exceptions that only worsen soundness

ahejlsberg commented 8 years ago

Looking through the RWC tests, I see only one case where the additional narrowing performed by #9407 causes unwanted errors. However, there were scores of real bugs and inconsistencies being caught, such as checking for the same value twice, assuming that enum members with the value 0 pass truthiness checks, dead branches in if and switch statements, etc. In aggregate, I think our optimistic assumption that type guards are unaffected by intervening function calls is the best compromise.

BTW, the compiler itself relies on side effecting changes to a token variable in the parser. For example:

if (token === SyntaxKind.ExportKeyword) {
    nextToken();
    if (token === SyntaxKind.DefaultKeyword) {
        // We have "export default"
    }
    ...
}

This becomes an error with #9407 because the compiler continues to think token has the value SyntaxKind.ExportKeyword following the call to nextToken (and thus reports an error when token is compared to SyntaxKind.DefaultKeyword).

We will instead be using a function to obtain the current token:

if (token() === SyntaxKind.ExportKeyword) {
    nextToken();
    if (token() === SyntaxKind.DefaultKeyword) {
        // We have "export default"
    }
    ...
}

where the function is simply:

function token(): SyntaxKind {
    return currentToken;
}

Since all modern JavaScript VMs inline such simple functions there is no performance penalty.

I think this pattern of suppressing type narrowing by accessing mutable state using a function is a reasonable one.

gvkhna commented 8 years ago

is there any possibility of a const on an argument?

function fn(const x: string | number)

In swift/objc they added the "in", "out", and "inout" meta attributes to arguments so you could see if they meant for a value to be modified. i feel like this could be useful for annotating whether a function will modify a variable it received or not. Not sure if I'm missing something.

kitsonk commented 8 years ago

While pure is all the "rage" these days, I think many JavaScript developers don't understand what it means and could easily lead to surprises and or large scale frustration. Because the inherit universal mutability of JavaScript I think pure would generally be shackles that would potentially go unused because of the over strictness of the concept... It just seems an "all or nothing" concept, when most of JavaScript is shades of grey.

I think a argument by argument modifier makes the most sense. Of course const and readonly only imply disallowing reassignment. Personally, I don't find it confusing of expanding those to inform the flow control that property mutations are "unsafe" as that would be a design time only construct.

Is there any benefit in considering a deeply immutable design time keyword to avoid any confusion between const and readonly? Potentially even immutable (though that is jargoney).

zpdDG4gta8XKpMCd commented 8 years ago

in my stubborn defense of pure:

it is easy to implement (at least in my naive view of it)
it is fast with almost no impact on current performance level (naive again)
it will do what it's been asked for
it will do it 100% correct no exceptions ever
it will cover 30% cases out of the box without having to learn anything: [1, 2, 3].map(x => x + 1 /* <-- hi, i am pure function */)
and it won't hurt to have it in addition to what will work best for the mainstream JavaScript audience

i really wish it was it was considered

other options may work out too and should be considered, just not a big fan of half-baked solutions

kitsonk commented 8 years ago

I understand your argument, although I think what you would cover with "obviously" pure functions are not ones that suffer from CFA challenges. You example of [1, 2, 3].map(x => x + 1) does not suffer from CFA function boundary issues, and while it is logically pure, there is not benefit in denoting this at design time.

I think your argument might have more merit if you could think of some examples where there are current CFA issues that would be solved by a pure notation that benefitted the developer. My argument isn't against the concept of writing pure functions, it is more of the argument that an "all or nothing" solution likely won't be practical.

One of the biggest current issues with CFA in my opinion is in Promise callbacks. There are many situations where those might not be pure. That is why I think an argument by argument mutability notation would solve significantly more use cases and do so in a more end-developer controllable way.

zpdDG4gta8XKpMCd commented 8 years ago

although you are right that the example doesn't target CFA issues, the answer is rather evident:

if a callback meets all constraints that make it pure, then it is safe to say that all assertions that were made outside of it are still valid inside of it (including all CFA reasonings deduced before) because it doesn't change anything

this is one merit of the pureness, among many others

let me say it again, if a promise callback is pure, all reasoning that was done outside of it's scope or time is still valid inside of it

you might wonder why i am so certain, let me explain, there is no notion of time in math that stands behind the idea of pureness, time doesn't exist there, it's absolutely static, everything existed forever and forever will and cannot ever change (because changes need time) to the point where a pure function can be thrown away and replaced with its result (what's called referential transparency) which for the any given arguments will always be the same (hence a question why not just use a hash-table instead to map arguments to results), so you don't really need a function in the first place, armed with such super static equivalence between results that were calculated vs hardcoded the only merit in using a function is to save some memory that otherwise would be taken by a huge static table mapping all possible arguments to all results, since math is absolute abstraction we only care when it comes to program it

this is what a pure function is, this is why it will always work

i agree that it's rather 'all or nothing', but is the price you have to pay for the peace of mind because in its strictness it solves the CFA problems and many others simply by giving them no place to exist

zpdDG4gta8XKpMCd commented 8 years ago

think about it, a "pure" (aka persistent) data structure, say, a linked list, can be deeply copied just by... being assigned:

const one = toList();
const absolutelyAnotherListThatWeCanDoWhateverWeWant = one;

how cool is that?

in my humble opinion "pure" functions and data structures can easily cover 40% of typical everyday needs without even feeling restricted

as for the rest 60% (since we don't appreciate any sort of extremes) we can use good old mutable impure JavaScript

but mind those 40% percent of no problem ever! that's a darn bargain, just let it happen and enjoy it

sledorze commented 8 years ago

totally agree with @aleksey-bykov arguments in favor of having "pure" introduced into the type checker.

gvkhna commented 8 years ago

Just my .02, I think "pure" would be too abstract and too confusing to understand.

I'm actually in favor of apple style semantics, "in", "out", "inout"

Also I think const could lead to confusion since they would be similar in usage but different concepts.

It might be easier for beginners to pick up and understand but the intent simply wouldn't be as clear.

maiermic commented 8 years ago

@RyanCavanaugh The compiler should already be able to answer the question in the example of Optimistic: Bad behavior on locals already with yes. If doSomething is called when token is Token.Alpha, the condition token !== Token.Alpha of the first if statement is false. Hence, the body is skipped and nextToken is never called. token is Token.Alpha still holds when the second if statement is reached. Ergo the answer is yes.

We can make this question insteresting again by changing doSomething like this:

function doSomething() {
    if (token !== Token.Alpha) {
        maybeNextToken();
        // is this possible?
        if (token === Token.Alpha) {
        // something happens
        }
    }
}

Now it is only possible if maybeNextToken changes token to Token.Alpha. If you are optimistic you would answer the question with no, since you assume that functions have no side effects like changing variables. However, if you change token in nextToken to Token.Alpha instead of Token.Beta

function nextToken() {
    token = Token.Alpha;
}

it would be possible if and only if ... something ... is true in maybeNextToken. If you are optimistic you would still answer the question with no even though the correct answer is yes, maybe.

maiermic commented 8 years ago

Infer constraints about functions

How about tracking variable changes of functions?

The basic idea is that we know a constraint before a function call and that we can reason about a constraint after a function call.

Note: I write a new introduced term bold in the following when it is mentioned the first time. Jump back to this position to see the definition of this term.

`if` statements

This idea is quite similar to control flow base type analysis. So let's take a look at the if statement in the example of control flow base type analysis first (before we discover constraints introduced by functions):

function foo(x: string | number | boolean) {
    if (typeof x === "string") {
        x; // type of x is string here
        x = 1;
        x; // type of x is number here
    }
    x; // type of x is number | boolean here
}

I rewrite the constraint annotations like this

function foo(x: string | number | boolean) {
    // x :: string | number | boolean
    if (typeof x === "string") {
        // x :: string
        x = 1;
        // x :: number
    }
    // x :: number | boolean
}

As usual, a single colon like in x: string | number | boolean means variable x has type string | number | boolean.

A double colon like in x :: string | number | boolean describes a value-typed-binding, which means that the current value of x is of type string | number | boolean. The type of the value may be any subtype of the type of the variable. You can for example conclude from the check typeof x === "string" is true that we have x :: string (the current value of x is a string).

The constraints of a simple if statement can be described in more general as

// cBefore
if (condition) {
    // c1
    ...
    // c2
}
// cAfter

Here constraint is abbreviated as c (cBefore stands for constraint before). I name our constraints in the example above like this:

function foo(x: string | number | boolean) {
  // cBefore := (x :: string | number | boolean)
  if (typeof x === "string") {
    // c1 := (x :: string)
    x = 1;
    // c2 := (x :: number)
  }
  // cAfter := (x :: number | boolean)
}

I write c := e to assign a constraint expression e to the constraint called c.

In the example, x = 1; is a side effect that causes a change on the constraint we have before. Before x = 1; we have c1 := (x :: string) and afterwards we have c2 := (x :: number).

I write c with c' for the constraint where all appearances of value-type-bindings x :: T of c are overridden with value-type-bindings x :: T' of c'. In this example c1 with c2 is (x :: string) with (x :: number) which results in (x :: number), since the old constraint value-type-binding (x :: string) in c1 is overridden by (x :: number) of t1.

I describe a side effect s as a constraint transition c -> c'. A constraint transition takes constraint c (that holds before side effect s) and gives you a new constraint c' (that holds after side effect s). Given a constraint transition t: c -> e where e is a constraint expression, I use function like notation t(cp) to express constraint c' that is equal to e where all appearances of c are replaced by cp.

Using this notation, the example from above looks like this:

// c1 := (x :: string)
x = 1; // t1: c -> c with (x :: number)
// c2 := t1(c1)

You can describe reasoning about constraints of a simple if statement in more general like this

// cBefore
if (condition) {
    // c1 := cBefore & condition
    ... // t1
    // c2 := t1(c1)
    //     = t1(cBefore & condition)
}
// cAfter := t1(cBefore & condition) | (cBefore & !condition)

I write c & condition to describe the case that condition is true. Otherwise, I write c & !condition to describe the case that condition is false. In both cases the constraint c had been given before the check. From the check of condition you can often reason about a new constraint c' that holds. If condition is true implies that c' holds, you can rewrite c & condition as c with c'. Similarly, if condition is false implies that c' holds, you can rewrite c & !condition as c with c'. If no constraint follows, you just write c instead.

In the example, condition is typeof x === "string". If condition is fulfilled the body is executed. Thereby, c1 is cBefore & condition. From fulfilled condition you know typeof x === "string" holds. From that, you can infer the constraint transition x :: string. As a result, cBefore & condition equals cBefore with (x :: string). Since cBefore is x :: string | number | boolean you get (x :: string | number | boolean) with (x :: string) that is reduced to x :: string.

After the if statement you know that

either condition had been fulfilled and it's body has been executed that causes side effects described by t1
- which would result in t1(cBefore & condition) and can be reduced to x :: string (like described above)
or that condition had been unfulfilled
- which would result in cBefore & !condition
- where !condition is !(typeof x === "string") which is typeof x !== "string"
- so you can infer (x :: string | number | boolean) with !(x :: string) which can be reduced to x :: number | boolean

From that you can conclude that cAfter is t1(cBefore & condition) | (cBefore & !condition) which is equal to (x :: string) | (x :: number | boolean) and can be reduced to x :: number | boolean.

Constraints introduced by functions

Now we have warmed up, we take a look at constraints that are introduced by functions.

(User-Defined) Type Guards

A type guard is a function whose return type is a type predicate:

function isFish(pet: Fish | Bird): pet is Fish {
    return (<Fish>pet).swim !== undefined;
}

Those type predicates are constraints that are introduced by a function (here isFish) and are already part of TypeScript:

let pet: Fish | Bird

if (isFish(pet)) {
    pet.swim();
}
else {
    pet.fly();
}

Let's add constraints as comments:

let pet: Fish | Bird
// cBefore := (pet :: Fish | Bird)
if (isFish(pet)) {
    // c1 := cBefore & isFish(pet)
    //     = (pet :: Fish | Bird) & (pet :: Fish)
    //     = (pet :: Fish)
    pet.swim();
}
else {
    // c2 := cBefore & !isFish(pet)
    //     = (pet :: Fish | Bird) & !(pet :: Fish)
    //     = (pet :: Bird)
    pet.fly();
}

Before the if statement you know that pet is Fish | Bird. If isFish(pet) is fulfilled, the type predicate pet is Fish of isFish holds for the value of our local variable pet. That means that it adds a constraint pet :: Fish to c1. Thus, it's ok to call pet.swim();. In the else branch you know that the type predicate pet is Fish is unfulfilled and you can add !(pet :: Fish) to c2, which results in pet :: Bird.

Functions in general

In a similar way to type predicates of type guards I introduce constraints that can be automatically inferred from expressions and statements. I will add constraints to the functions one after another in my modified example (see comment above) of the example behavior on locals (see first post) to answer the question is this possible? in the comment in the function doSomething:

enum Token { Alpha, Beta, Gamma }
let token = Token.Alpha;
function nextToken() {
    token = Token.Alpha;
}
function maybeNextToken() {
    if (condition) {
        nextToken();
    }
}

function doSomething() {
    if (token !== Token.Alpha) {
        maybeNextToken();
        // is this possible?
        if (token === Token.Alpha) {
          // something happens
        }
    }
}

`nextToken`

function nextToken() {
    token = Token.Alpha;
}

nextToken assigns Token.Alpha to the shared local variable token. Like the assignment x = 1; in the if statement example, token = Token.Alpha is a side effect that can be described as a constraint transition:

function nextToken() {
    // cBefore
    token = Token.Alpha; // t1: c -> c with (token :: Token.Alpha)
    // cAfter := t1(cBefore)
    //         = cBefore with (token :: Token.Alpha)
}

Here cBefore describes the constraints that we have before calling nextToken and cAfter describes the constraints that we have after calling nextToken. Like an assignment, a function can be described as a constraint transition. Since nextToken is doing just the same as the assignment token = Token.Alpha; it describes the same constraint transition as t1.

`maybeNextToken`

Let's have a look at the caller maybeNextToken:

function maybeNextToken() {
    if (condition) {
        nextToken();
    }
}

First you add the constraint transitions (here t1 for nextToken) and afterwards the constraints of the if statement:

function maybeNextToken() {
    // cBefore
    if (condition) {
        // cBefore & condition
        nextToken(); // t1: c -> c with (token :: Token.Alpha)
        // t1(cBefore & condition)
    }
    // cAfter := t1(cBefore & condition) | (cBefore & !condition)
}

Since condition is not specified in the original example you don't know its result. I assume that it is an expression without any side effect. To relax this assumption even further I consider condition is a shared variable of type boolean. Thereby, you can derive condition :: true if condition is fulfilled and condition :: false otherwise. Thereof, cAfter is

((cBefore with (condition :: true)) with (token :: Token.Alpha)) | (cBefore with (condition :: false))

Despite that simplification maybeNextToken is described as the constraint transition cBefore -> cAfter.

`doSomething`

function doSomething() {
    if (token !== Token.Alpha) {
        maybeNextToken();
        // is this possible?
        if (token === Token.Alpha) {
          // something happens
        }
    }
}

Like in maybeNextToken you first add constraint transitions and afterwards the constraints of the if statement:

function doSomething() {
    // cBefore
    if (token !== Token.Alpha) {
        // c1
        maybeNextToken(); // t1: c -> ((c with (condition :: true)) with (token :: Token.Alpha)) | (c with (condition :: false))
        // c2
        // is this possible?
        if (token === Token.Alpha) {
          // c3
          // something happens
          // c4
        }
        // c5
    }
    // cAfter
}

Our goal is to answer the question if it is possible that token === Token.Alpha is fulfilled in the condition of the second if statement. This is only possible if c2 contains a constraint token :: T where T is a type that contains Token.Alpha. Hence, you only look at c1 and c2:

c1 is cBefore & token !== Token.Alpha. Since token is not Token.Alpha it only can be Token.Beta or Token.Gamma. This results in cBefore with (token :: Token.Beta | Token.Gamma).

c2 is derived by replacing c in t1 with c1:

c2 := ((c with (condition :: true)) with (token :: Token.Alpha))
    | (c with (condition :: false))
  = (((cBefore with (token :: Token.Beta | Token.Gamma)) with (condition :: true)) with (token :: Token.Alpha))
    | ((cBefore with (token :: Token.Beta | Token.Gamma)) with (condition :: false))

Since the type of the value of token in cBefore is overridden by cBefore with (token :: ...) operations, you can leave cBefore out:

  = (((token :: Token.Beta | Token.Gamma) with (condition :: true)) with (token :: Token.Alpha))
    | ((token :: Token.Beta | Token.Gamma) with (condition :: false))

I'm only interested in token to answer the question. Hence, you can remove all with operations of other variables (here condition):

  = ((token :: Token.Beta | Token.Gamma) with (token :: Token.Alpha))
    | (token :: Token.Beta | Token.Gamma)
  = (token :: Token.Alpha)
    | (token :: Token.Beta | Token.Gamma)
  = token :: Token.Alpha | Token.Beta | Token.Gamma
  = token :: Token

From c2 = token :: Token you conclude that the answer of the question is: Yes, token might be Token.Alpha in the condition of the second/inner if statement.

yortus commented 8 years ago

@maiermic your post was pretty complex, so I may have misunderstood it, but it looks like you are describing what control flow anaylsis already does, plus extending it to 'look into' the functions being called within guarded blocks as well. If so, how is this different to the inlining approach mentioned by @RyanCavanaugh in the OP?

maiermic commented 8 years ago

@yortus That's right, I describe what control flow analysis already does to introduce the notation I use afterwards to explain my idea. However, I'd like to clarify that I don't 'look into' a function every time it is called (if that wasn't clear so far). Instead I infer the constraint transition of a function when I look at it's definition. Therefore, I have to look into the function, but I only have to do this once, since I can carry the constraint transition in the type of the function. When I look at the function call, I can use the constraint transition that I inferred before to calculate the constraint that holds after the function call from the constraint that holds before the function call.

Mitigating with (shallow) inlining / analysis

I don't know how (shallow) inlining/analysis works and @RyanCavanaugh doesn't explain it. Nevertheless, he shows two examples.

1. Example

// Non-null assignment can still trigger null warnings
function fn(x: string | null) {
    function check1() {
        x = 'still OK';
    }

    if (x !== null) {
        check1();
        // Flow: Error, x could be null
        console.log(x.substr(0));
    }
}

Flow claims that x could be null after the call of check1. Flow doesn't look into check1. Otherwise, it would know that x has type string and cann't be null.

In my approach, the constraint transition of check1 is inferred as t: c -> (x :: string). Since we checked that x !== null is true before and the type of the variable x is string | null we have x :: string as constraint that holds before the call of check1. Hence, we pass x :: string to t to get the constraint that holds after the call of check1. t(x :: string) results in x :: string. As a consequence, x cann't be null in console.log(x.substr(0));.

Note: You can even infer x :: 'still OK' in check1 in that case. Thereby, you could even determine the type of the return value of x.substr(0) which is 's'. However, this would require further knowledge about the build-in method substr. I didn't cover such an approach in my previous post.

2. Example

// Inlining is only one level deep
function fn(x: string | null) {
    function check1() {
        check2();
    }
    function check2() {
        x = null;
    }

    if (x !== null) {
        check1();
        // Flow: No error
        console.log(x.substr(0)); // crashes
    }
}

To come straight to the point, the constraint transition

of check2 is t2: c -> (x :: null) and
of check1 is t1 = t2

Before the call of check1 we have x :: string. Afterwards we have t1(x :: string), which results in x :: null. Thus, my approach detects the error and flow doesn't.

yahiko00 commented 8 years ago

Quite long to fully read and understand, but this seems very interesting. It makes me think of a kind of propositional logic solver like the Prolog language. Correct me if I'm wrong. Not sure it would be easy to implement such a thing, but, this could be an evolution target for the TypeScript's Type Inference and Control Flow analysis.

yortus commented 8 years ago

@maiermic hmm yes interesting. I suppose two issues would be:

a) performance. @RyanCavanaugh mentions a "full inlining solution [...] wouldn't be even remotely practical", but what about this deep constraint analysis?

b) what about calls to third-party functions (e.g. calls into platform, frameworks, or node dependencies) where we have no function bodies to look into (only .d.ts declarations)?

zpdDG4gta8XKpMCd commented 8 years ago

maiermic commented 8 years ago

@aleksey-bykov #8545 (comment) looks like a similar approach for manually annotating constraints, even though you only refer to a specific case (object initializers)

maiermic commented 8 years ago

@yortus good questions

a) performance. @RyanCavanaugh mentions a "full inlining solution [...] wouldn't be even remotely practical", but what about this deep constraint analysis?

@RyanCavanaugh May you please explain performace issues of a full inlining solution?

Without thinking too much about it, I guess deep constraint analysis might be doable in linear or linearithmic time complexity. Gathered constraints are seralizable and can be cached so that you can speed up analysis time. Manuall constraint annotations can further improve analysis time.

b) what about calls to third-party functions (e.g. calls into platform, frameworks, or node dependencies) where we have no function bodies to look into (only .d.ts declarations)?

If you don't know the code, you cann't do flow analysis. Without .d.ts declarations you couldn't even do any type analysis. You either include constraint annotaions in .d.ts files or you have to fall back to another approach. You might choose an optimistic approach if no constraint annotaions exist, since you can add constraint annotations to .d.ts if a function has a relevant side effect.

ivogabe commented 8 years ago

Initially I thought that the current unsound strategy would work, though in the previous months 1) narrowing works on more locations (with CFA) 2) narrowing works on more types (enums, numbers) and 3) narrowing works with more type guards. This has demonstrated several issues with the current strategy.

A solution that has been discussed was that the compiler should track where a variable might be modified through a function call. Since JavaScript is a higher-order language (functions can be passed as arguments, stored in variables and so on), it's hard/impossible to track where a function might be called. A reassignment of a variable can also happen without a function call, for instance by a getter or setter function, or by arithmetic that calls .valueOf() or .indexOf(). For a language that is not higher-order, the analysis can be done easily, since a function call will directly refer to the declaration of a function. Using the same analysis in a higher-order language will still be unsound; this will probably cause issues when an assignment happens in .forEach(() => { ... }) for instance.

My suggestion is basically that a variable may only be narrowed if its assignments can be tracked completely. A variable can only be narrowed if all reassignments (that is, assignments except the initializer) are inside the control-flow graph of the current function. That means that a variable declared with const or one that has no reassignments can be narrowed in any function. A variable with reassignments in different functions can never be narrowed. The later is restrictive; a user should copy a value of such variable into a const variable to narrow it.

To reduce this restriction, the control-flow graph of the function could be extended to an inter-procedural control flow graph. The analysis stays sound, but becomes more accurate. The graph can relatively easily be extended at call expression that directly refer to a function declaration. Assignments in functions that are not used in higher-order constructs can now be tracked too. It would be interesting to extend this to higher-order functions too, but that would be more complex. It would be necessary to annotate function definitions with the control flow of their arguments some way.

I'm not sure whether an abstraction over the control flow graph is needed. The graph could become a lot bigger with this inter-procedural analysis. However, the compiler already uses a compressed control flow graph, which basically contains only branch nodes and assignments. It might help to track which variables are referenced by a function to skip the whole function if the analyzed variable is not referenced there.

Thoughts? Let me know if something is not clear.

felixfbecker commented 8 years ago

How is this code possible with strictNullChecks on? I want to use strict null checks, but this control flow checking makes it impossible to write this kind of code efficiently:

let myValue: string
[1, 2, 3].forEach(x => {
  if (x == 2) {
    myValue = 'found it'
  }
})
console.log(myValue) // error: myValue used before assignment

Since the callback is not guaranteed to be executed TS sees that not all code paths assign the value. The same would be true when using a for of loop, because the array is not guaranteed to be non-empty. Even though I as a developer can guarantee for sure that [1, 2, 3] is not empty and forEach will be called, there is no way for me to tell that TS, no /* ts ignore no-use-before-assignment */ or something like that. I have to disable strict null checks completely.

Arnavion commented 8 years ago

@felixfbecker

Either initialize it with a sentinel:

let myValue: string = ''
// Dangerous if myValue is never actually set to a real value

Or, if you're worried about the sentinel never getting overwritten, with undefined. This more closely mimics the underlying JS but requires ! assertion on use:

let myValue: string | undefined = undefined;
...
console.log(myValue!);
// Still dangerous if myValue is never actually set to a real value, but less so since undefined is likely to fail earlier / harder

Or, have a check to narrow the type:

let myValue: string | undefined = undefined;
...
if (myValue === undefined) {
    throw new Error();
}
console.log(myValue); // Guaranteed safety

felixfbecker commented 8 years ago

@Arnavion thanks!

bjouhier commented 8 years ago

@RyanCavanaugh Following up here, from #11393, with a suggestion.

I would be nice to have an indication of whether the optimistic heuristic was used or not when reporting an error. Something like Operator '===' cannot be applied to types 'false' (likely) and 'true'.

This will allow the developer to quickly figure out if the error is certain, or if it may just be the consequence of an optimistic assumption made by the compiler. It may also reduce the flow of non-bugs reported here 😄.

basarat commented 8 years ago

optimistic heuristic

Just wanted to point out that TypeScript has no concept of this at the moment :rose:

bjouhier commented 8 years ago

@basarat s/heuristic/assumption/

What I mean is the following:

function foo(arg: any) { arg.val = true; }

var arg = { val: true };
arg.val = false;
foo(arg);
if (arg.val === true) console.log('test is true!');

Error: Operator '===' cannot be applied to types 'false' (likely) and 'true'

Without foo(arg) call:

Error: Operator '===' cannot be applied to types 'false' and 'true'

masaeedu commented 8 years ago

Would it be possible to have an affects ... return type, in the same way as we have ... is ...? Any functions containing assignments or invocations of functions returning affects ... would implicitly have affects ... intersected with the annotated return type. This could be used to do transitive flow analysis as we currently have for never, and avoid some problems with excessively aggressive or permissive narrowing due to lack of information at function call boundaries.

masaeedu commented 8 years ago

Went through the thread and realized what I was proposing is a naive subset of what has already been fleshed out in @maiermic's amazing treatment.

I think it is worth opening a separate issue to discuss the merits of that approach and possibly get a feel for it by reimplementing the existing type guards feature in this fashion. A systematic approach to flow analysis would make reasoning about more advanced features much easier. E.g. if this was implemented, pure could be implemented internally as cAfter := cBefore. Even if this doesn't end up affecting the implementation in any way, it seems like a useful convention for discussing flow analysis in the compiler.

Strate commented 7 years ago

Didn't find similar case, could you check this out?

function render(): string | null {
    const state: "a" | "b" | "c" = "a"
    if (state === "a") {
        return "a"
    } else if (state === "b") {
        return "b"
    } else if (state === "c") {
        return "c"
    }
}

This code fails to compile on typescript 2.1 with strictNullChecks option enabled. Is it covered by somw tradeof with control flow analysis or could be considered as a bug?

ghost commented 7 years ago

Here's a case that should be possible to figure out:

function f(a: number[] | undefined) {
    if (a === undefined) {
        return [];
    }

    // Works
    a = a.map(x => x + 1);
    return a.map(x =>
        // Error: Object is possibly 'undefined'.
        x + a.length);
}

The closure only exists in a scope where a exists. There's no way to access it in the if block.

ghost commented 7 years ago

@Strate That function will return undefined, not null.

ahejlsberg commented 7 years ago

@andy-ms Your example above works as long as there are no assignments to a anywhere in the function. See #10357.

evmar commented 7 years ago

This below simple snippet demonstrates the problem again. (It fails in the if statements with "Operator '==' cannot be applied to types 'Y.A' and 'Y.B'.")

Without proposing any changes to the language or heuristics here, what's an idiomatic way to make this code compile? Right now it's rejected by the compiler. Should we be inserting type coercions on each use of x to force it back to type Y?

enum Y {
  A,
  B
}

let x: Y = Y.A;

x = Y.A;
changeX();
if (x == Y.B) { ... }
if (x == Y.B) { ... }

function changeX() {
  x = Y.B;
}

ghost commented 7 years ago

You could write const getX = () => x; and use getX() in place of x. (You could also change the design to something like x = changeX().)

bgnx commented 7 years ago

Is there any options to disable this current optimistic behavior or disable cfa completely???

floooh commented 6 years ago

FWIW, I just stumbled over exactly the same problem, also in a tokenizer/parser, where a next_token() method is called which sets a class member to another token, and the calling code doesn't detect that there's a new token and tells me my comparison is impossible (because it thinks the token type must be LeftBracket):

no2chem commented 6 years ago

Look, I get that there is a debate on how much CFA typescript should do. But the CFA errors are cryptic.

I just spent 20 minutes debugging this:

  run() {
        this.state = State.RUNNING;
        while (this.state === State.RUNNING) {
            const nextOp = this.code[this.pc];
            if (nextOp === undefined) {
                break;
            } 
            nextOp.operation(this);
        }

        if (this.state === State.EXCEPTION) {
            throw new Error(this.exceptionReason);
        } 
}

Which results in

Operator '===' cannot be applied to types 'State.RUNNING' and 'State.EXCEPTION'.

Presumably because CFA thinks that this.state must always === state.RUNNING. But nextOp.operation(this) can change this.state which CFA cannot deduce at compile time.

Of course, if you remove the break, all of a sudden the code works! :confounded:, because the CFA is no longer over-eager about determining the value of this.state. (IMO this is a bug, but it seems that there is a discussion about this instead)...

Either way, I believe most programmers would be scratching their heads wondering what is going on. It would be helpful if the error message said something to the effect of: Error: CFA predicts that this path is not possible, try refactoring your code.

masaeedu commented 6 years ago

I think it might be a little bit easier to reason about this sort of thing if the treatment of effects (side effects like local mutation) and "coeffects" (requirements that certain effects have occurred) was formally specified. Then it would be clearer which weird behaviors are down to PEBKAC, which ones are due to bugs in the type checker implementation, and which ones are bugs in the specification.

donaldpipowitch commented 6 years ago

Hi there! Sorry, but I lost the overview what this issue currently tracks and if there is a discussion to fix some flow related issues?

I came here to ask this question, because I saw two unrelated people stumbled over a destructing issue related to union types and they couldn't figure out the problem. I know why this happens, but is there some discussion to maybe fix this behaviour to make some code more ergonomic?

type AorB = { type: 'a'; value: string } | { type: 'b'; value: number };

function fun({ type, value }: AorB) {
  switch (type) {
    case 'a':
      return value.toLowerCase(); // throws: TS thinks it could be `number`
    case 'b':
      return value;
  }
}

Bnaya commented 6 years ago

@donaldpipowitch to solve this kind of issue, there's a need in somekind of depended type expression, that the control flow analysis & destructuring will create on the fly I've typed to make one explicitly with conditional mapped types, but its not working:


function fun<T extends 'a' | 'b'>(type: T, value: T extends 'a' ? string : number) {
  switch (type) {
    case 'a':
      return value.toLowerCase(); // throws: TS thinks it could be `number`
    case 'b':
        return value.toExponential();
  }
}

function alsoNotLikeThat<T extends 'a' | 'b'>(type: T, value: typeof type extends 'a' ? string : number) {
  switch (type) {
    case 'a':
      return value.toLowerCase(); // throws: TS thinks it could be `number`
    case 'b':
        return value.toExponential();
  }
}

If would be nice if my code above would have worked

pimterry commented 5 years ago

I see there's few examples with synchronous side effects in functions here, but lots of linked issues around how this causes big problems with async/await, and no direct discussion of that that I can see, so it's worth highlighting. Here's an example of some async code where this issue causes problems:

async function test(p: Promise<any>) {
    let x: 'a' | 'b' = 'a';

    setTimeout(() => {
        x = 'b';
    }, 500);

    await p;

    // x is inferred as 'a' here, so this isn't allowed, but it could be 'b'
    if (x === 'b') {
    }
}

(Playground link%3B%0D%0A%0D%0A%20%20%20%20await%20p%3B%0D%0A%0D%0A%20%20%20%20if%20(x%20%3D%3D%3D%20'b')%20%7B%0D%0A%20%20%20%20%7D%0D%0A%7D%0D%0A))

Note that the inference is wrong here despite explicit types of x. This issue makes it very awkward to write a function using await to wait for a side effect, which I think is a fairly common case.

Meanwhile in the equivalent promise-based code the types work totally fine:

function test(p: Promise<any>) {
    let x: 'a' | 'b' = 'a';

    setTimeout(() => {
        x = 'b';
    }, 500);

    p.then(() => {
        // x is 'a' | 'b' here, correctly
        if (x === 'b') {
        }
    });
}

Sharpiro commented 5 years ago

Had a similar problem strangely similar to @RyanCavanaugh's original post:

https://stackoverflow.com/questions/53524794/typescript-type-narrowing-too-strict-when-using-class-method-to-update-property?noredirect=1#comment93919371_53524794

bobmoff commented 5 years ago

I just just experienced the exact problem that @pimterry explained in (https://github.com/Microsoft/TypeScript/issues/9998#issuecomment-439610488) and was a bit confused as why it complained to me that this wasn't allowed

Amareis commented 5 years ago

+1 for https://github.com/Microsoft/TypeScript/issues/9998#issuecomment-439610488, this is really stupid, await is definitely not a pure statement.

jhnns commented 5 years ago

I just wanted to document another workaround for this problem. Instead of using a getter function as described by @ahejlsberg, it's also possible to use a type assertion:

// instead of
// let value: 1 | 2 = 1;
let value = 1 as 1 | 2;

function changeValue() {
    value = 2;
}

changeValue();

value; // `value` is correctly typed as 1 | 2

basickarl commented 4 years ago

The reason I don't use TypeScript in certain projects where we are not forced to use TypeScript is due to this issue.

interface Something {
    maybe?: () => void;
}

function run(isTrue: boolean): Something {
    const object: Something = {};
    if (isTrue) {
        object.maybe = (): void => {
            console.log('maybe');
        };
    }
    return object;
}

run(true).maybe();

The code above is deterministic. There will always be one outcome. TypeScript should be able to follow this. I understand it will however need to follow this object specifically and every other object for that matter through their entire life in the code. Which would put a lot of stress on the compile time checking.

I don't understand why I am allowed to set a method as optional and use it in this way.

Personally I don't see the advantage of using TypeScript when this is the type of code I wish to write, hence why I use plain JavaScript.

Regarding how to implement this, a simple flag in the tsconfig could allow control flow analysis.

ArmorDarks commented 4 years ago

Sorry for bumping this, but this issue becomes quite old while there seem to be no plans in TS roadmap to do something with it.

The current situation with await is not invalidating type narrowing is quite severe, as it allows to easily trigger runtime errors when there are many async functions dealing with the same object, as it's described in https://github.com/microsoft/TypeScript/issues/9998#issuecomment-439610488

Note that while invalidating each type after await feels like "hard to use` for the end-users, the issue is easily solved by what Flow already proposes: if you need to work with some value of the object which might be invalidated, simply save the reference and be with it:

declare let test:
  | { type: 'ONE'; value?: number }
  | { type: 'TWO'; value: number }

const run = async (): Promise<void> => {
  if (test.type !== 'TWO') {
    throw null
  }

  const savedValue = test.value

  await delay(1000)

  test.value // should invalidate to be `number | undefined`
  savedValue // should be type narrowed `number`
}

ExE-Boss commented 4 years ago

@basickarl You need to use generics for your case:

interface Something {
    maybe?: () => void;
}

function run<T extends boolean>(isTrue: T):
    T extends true
        ? Something & Required<Pick<Something, "maybe">>
        : Something {
    const object: Something = {};
    if (isTrue) {
        object.maybe = () => {
            console.log('maybe');
        };
    }

    // @ts-ignore
    return object;
}

run(true).maybe();

// @ts-expect-error
run(false).maybe();

basickarl commented 4 years ago

@ExE-Boss Thank you SO much! I posted a question on Stack Overflow but it was slaughtered, I will update with this code of yous, hopefully it will help future people!

cjbarth commented 4 years ago

@basickarl Would you mind posting a link to your Stack Overflow question here too?

microsoft / TypeScript

Trade-offs in Control Flow Analysis #9998

Optimistic: Bad behavior on locals

Optimistic: Bad behavior on fields

Pessimistic: Bad behavior on locals

Pessimistic: Bad behavior on fields, example 1

Pessimistic: Bad behavior on fields, example 2

Mitigating with (shallow) inlining / analysis

Mitigating with `const` parameters

Mitigating with `readonly` fields

Mitigating with other options

Infer constraints about functions

`if` statements

Constraints introduced by functions

(User-Defined) Type Guards

Functions in general

`nextToken`

`maybeNextToken`

`doSomething`

microsoft / TypeScript

Trade-offs in Control Flow Analysis #9998

Optimistic: Bad behavior on locals

Optimistic: Bad behavior on fields

Pessimistic: Bad behavior on locals

Pessimistic: Bad behavior on fields, example 1

Pessimistic: Bad behavior on fields, example 2

Mitigating with (shallow) inlining / analysis

Mitigating with const parameters

Mitigating with readonly fields

Mitigating with other options

Infer constraints about functions

if statements

Constraints introduced by functions

(User-Defined) Type Guards

Functions in general

nextToken

maybeNextToken

doSomething

Mitigating with `const` parameters

Mitigating with `readonly` fields

`if` statements

`nextToken`

`maybeNextToken`

`doSomething`