microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.83k stars 12.46k forks source link

Require all user-defined guards to accept type 'any' #4898

Closed rotemdan closed 9 years ago

rotemdan commented 9 years ago

If a guard accepts an input type other than any, it cannot be assured to provide any meaningful or reliable result:

interface A { 
    func1: () => number;
}

interface B extends A {
    func2: () => string;
}

function isB(x: A): x is B {
    return x != null && typeof x.func2 === "function";
}

function example(input: A) {
    if (isB(input)) {
        input.func2(); // Guaranteed to work!
        input.func1(); // Can fail :(
    }
}

The problem is that the guard isB assumed input had type A, but that assumption is not generally safe to make, especially when the function was called by a plain Javascript consumer, who does not have a compile-time checking mechanism for the associated static type. To fix this, we need to define an additional guard for A, that would accept input type any.

function isA(x: any): x is A {
    return x != null && typeof x === "object" && typeof x.func1 === "function";
}

and then combine it with the check for isB:

function example(input: A) {
    if (isA(input) && isB(input)) {
        input.func2(); // Guaranteed to work!
        input.func1(); // Guaranteed to work!
    }
}

But now isB is dependent on a prior call to isA to provide a correct result, meaning it still isn't reliable or useful by itself. However, we can fix this by having isB internally call isA, and now its input type can also be safely changed to any:

function isB(x: any): x is B {
    return isA(x) && typeof x.func2 === "function";
}

This transformation can be done with any user-defined guard that accepts a type other than any and removes the need for these types of guards in general.

Class embedded guards

Guards that are embedded in classes:

class A {
    ...

    isB(): this is B {
        return false;
    }
}

class B extends A {
    ...

    isB(): this is B {
        return true;
    }
}

class C extends B {
    ...

    isB(): this is B {
        return false;
    }
}

Are even less reliable:

function example(input: A) {
    if (input.isB()) { 
        ...
    }
}

example(undefined); // Error, 'input' is 'undefined'
example(callSomeUnsafeFunction()); // Error, 'isB' method is not found

and would always need to be composed with a check for the base type (which should hopefully also include a check for null or undefined inputs):

function example(x: A) {
    if (isA(x) && x.isB()) {
        ...
    }
}

To fix this, we'll create a function that combines all the tests for B and accepts type any:

// This guard would only return true if the input is a strict instance of 'B',
// and not one of the derived classes.
function isStrictlyB(x: any): x is B {
    return x != null && isA(x) && isB(x) && !isC(X);
}

This removes the need for these types of guards as well.

Implications

Guards that apply compile-time assumptions on their input value's type are not generally reliable for use by themselves, may yield incorrect results or throw an exception. There are safer and more reusable alternatives for these types of guards, thus they can be deprecated without a significant loss of functionality.

mhegazy commented 9 years ago

User defined type guards allow for modelling existing patterns of using helper functions to check types; the declaration allows for the API designer to pass on this information to the compiler. How they are designed or implemented is a user choice. there is definitely room for enforcing policies, either through documentation or a linter rule extension to ensure they are implemented and used in a certain way, but this would be out-of-scope for the typescript compiler.

rotemdan commented 9 years ago

@mhegazy

User defined type guards allow for modelling existing patterns of using helper functions to check types

That's a very strange thing to say, because a sound check would always require type any. I.e. not apply any assumptions on the tested value's type. The existing patterns that modeled are then, very poor.

the declaration allows for the API designer to pass on this information to the compiler

Well, it was designed to do that, in a specific way by @RyanCavanaugh.

How they are designed or implemented is a user choice.

Mostly @RyanCavanaugh choice, because he chose to allow non-any types to be used, and he chose to allow checks embedded in classes.

there is definitely room for enforcing policies, either through documentation or a linter rule extension to ensure they are implemented and used in a certain way, but this would be out-of-scope for the typescript compiler.

This is definitely not something for linters, this is about fundamental property about the rules of logic. If the inference rules are correct, but the assumption is invalid (e.g. it is falsely assumed that x has type A) then the argument is not sound and thus the result is not generally reliable or useful. There really isn't much more to that.

Overall, the explanation you gave is mostly circular:

  1. User-defined guard were proposed in some form (I believe by @RyanCavanaugh) and then implemented in some way.
  2. They already exist in the compiler.
  3. Changing them is out-of-scope, and is justified by the way they are applied in the compiler, and they way they were designed (go back to 1.)

I appreciate your attempt at explaining your reasoning, but I don't find it satisfying. I would like to have a thoughtful conversation from @RyanCavanaugh on the matter, if possible, as it seems like it was his original idea.

I have spent many hours to craft, consider and formulate this above text, (some of the observation sprung up from an attempt to design an automated checker for function arguments). It is inconceivable that it would be dismissed in this way.

Please remove the tags and reopen the issue. I'm a very reasonable and open-minded person, and if @RyanCavanaugh (or someone else) would manage to convince me through a genuine bidirectional conversation it is not desirable to require sound run-time checks, I promise I will close the issue myself.

RyanCavanaugh commented 9 years ago

Some meta-discussion to start: engineering hours from the TypeScript team are a finite resource. These hours are not up for requisitioning for the sake of someone's satisfaction on a given topic. You've raised some relevant design decisions here that I will try to address, but it's not our policy to open issues that aren't actually open in terms of their state in the workflow. We have thousands of issues and they need to follow some process to be managed effectively.

We get hundreds of suggestions and thousands of comments about how things should work. At the end of the day, a very small percentage are integrated into the language, because it would be literally impossible to do all of them (Bob says go left, Alice says go right), because they are at odds with our design goals (make TypeScript the same as Haskell, or C#, or Go, or ...), or because they are lower priority than other work we have to do. I know it's frustrating to have features or design decisions declined, but it's part of the process and I assure you that we are reading all these issues and taking the feedback into account.


We need to start with some background on what assumptions our type system takes universally. A scenario alluded to here and in #4868 is the one in which a "completely wrong" value (e.g. passing in Giraffe to a function accepting Cat|Dog) is present. Our entire type system is predicated on this not happening. This assumption is critical and necessary -- if you take the converse assumption that any function might be passed anything at any time, it's not even safe to write function f(s: string) { console.log(s.length); } without some form of check, which is far too restrictive of a language to be palatable to JavaScript developers.

We consider the assumption that a function accepting Cat|Dog not actually get a Giraffe at runtime to be a good compromise in terms of safety vs convenience. Yes, external callers exist, and yes, type assertions exist and yes, any exists, but we're not a formally-provable language and we don't intend to be.

Moving on, what is the philosophy behind user-defined type guards?

In the built-in type guards, we have basically two categories. The first is the primitive checks, e.g. typeof foo === 'string'. Those checks are known to always be correct -- it's impossible to "fool" typeof in any meaningful way. String values never have a non-string typeof. It's safe to remove primitive types from the union in the else clause because of this.

The second category is instanceof. Here, we take a more conservative approach -- just because some value isn't in the prototype chain of an object doesn't mean the object isn't a structural match for the given class. There's some debate over whether that behavior is actually even desirable (e.g. given x: string|Array<string>, people seem to expect if(x instanceof Array) { ... } else { ... } to narrow x to string in the second block).

Given a user-defined type guard (UDTG), we assume that the guard implementor knows what they're doing, and treat a UDTG as being as strong as a primitive check. It's always going to be possible for a UDTG author to implement their guard incorrectly; forcing them to write an overload accepting any won't fix that, nor will it even help them to do so. Fundamentally, writing runtime-sound type guards is an unsolvable problem, because there's no code you can write to distinguish the types (n: number) => void and (n: string) => void, so in that sense all UDTGs that attempt to guard a function type are fatally flawed in the "you can't prove it" sense.

To address some specific points:

[...] a sound check would always require type any. I.e. not apply any assumptions on the tested value's type.

As mentioned above with (n: number) => void vs (n: string) => void, there are always going to be some assumptions involved if we're talking about anything but the most basic types.

The existing patterns that modeled are then, very poor.

Welcome to JavaScript? :wink:

If the inference rules are correct, but the assumption is invalid (e.g. it is falsely assumed that x has type A) then the argument is not sound and thus the result is not generally reliable or useful.

As stated in the top of this post, we do make the assumption globally. I think it's reasonable to disagree with that assumption, but it is not an assumption that is specific to type guards, nor one that could reasonably be unassumed at this point in the language (you're basically looking at a completely different language at this point).

rotemdan commented 9 years ago

@RyanCavanaugh

I appreciate taking your time to write a comprehensive answer, but I don't find your reasoning satisfying:

  1. Guards are essentially run-time type checks.
  2. Guards are able to reasonably test a value for a specific type or set of constraints, with minimum assumptions.
  3. Therefore the language should encourage the tests to be performed with as minimum assumptions as possible. Just like the compiler tries to do its best to maintain compile-time type safety and encourage best practices. The same can be done with run-time checks. Requiring any type is simply one of those 'best practices'. It doesn't guarantee the test would be correct or useful, but it does its best and puts the programmer into a mindset where the tested object is a 'blank-slate' with no initial assumptions. Allowing anything else is effectively an act encouragement for bad practices, and I don't see any justification for it.

This doesn't have anything to do with TypeScript's type system, or its design philosophy. The same could be done in Javascript or say, Python, only the types could be used as say, string identifiers, it doesn't matter here. If there are limits to what can be tested for a specific type, it is not a limitation that has relevance to this particular issue.

I speculate the problem here was mainly operational, namely engineers having too much to do and so many little problems to take care of, and design sometimes gets rushed or compromised. It is unfortunate this turn out this way, but from my position there is nothing I can do aside from trying my best to help.

I don't think it is similar to yet any other random suggestion of some person trying to promote some idea or match another language's feature. This is a fundamental aspect of logic. This is not an opinion or a specific approach. It is an objective fact. Correct deduction cannot be made with false premises, regardless of the validity of the inference rules. That is exactly the concept of soundness.

I do believe this can be changed, even at this state. It doesn't seem difficult at least, as it's mostly an act of subtraction and personal motivation and willingness to admit and correct one's own mistakes, or at least rethink them. It is up to the developers and designers to decide whether they want it or not. I have done my part, at least.

RyanCavanaugh commented 9 years ago

The premise that a type guard accepting any is going to be more "correct" is what's really at issue here.

What makes this function

interface Thing { alpha: number; beta: string; }

function isThing(t: any): t is Thing {
    return t &&
        typeof t === 'object' &&
        typeof t.aplha === 'number' &&
        typeof t.beta === 'string';
}

better than this function?

function isThing(t: Thing): t is Thing {
    return t &&
        typeof t === 'object' &&
        typeof t.alpha === 'number' &&
        typeof t.beta === 'string';
}
rotemdan commented 9 years ago

@RyanCavanaugh

I never actually mentioned cases where the input type is the same as the tested type, although that would seem like affirming the consequent, it could generally be a way to get auto-completion on the type, so may be somewhat more harmless, (but still dangerous as the editor would encourage things like dotting into properties that have not been tested if they are null/undefined, so any may still be a safer option). I mentioned cases where it was a supertype (a union could also be a similar case here, though probably worst in practice):

interface A {
    x: number;
}

interface B extends A {
    y: string;
}

function isB(x: A): x is B {
    ...
}

What this does is create a dependent guard, that cannot stand by its own, because it starts with the premise that the tested value's type is a super-type of the target type. Requiring any would force the programmer to embed that dependency check inside the guard and render it independent:

function isB(x: any): x is B {
    return isA(x) && ....
}

Writing this a little differently, the guard can safely use the auto-completion for type A that is given due to the isA guard itself:

function isB(x: any): x is B {
  if (isA(x)) {
     ... // Autocompletion for 'A' here
  }
}
danquirk commented 9 years ago

You are prioritizing the one time that one person writes the type guard isB over every single usage of the actual isB function.

In your worst case scenario, a programmer accidentally writes an incorrect set of runtime checks for isB. Eventually they discover this and fix the type guard implementation (and there are many ways that they could've written an incorrect set of runtime checks, regardless of the input type annotation).

Both before and after fixing this error every single use of isB is now exposed to a compile time bug we could have caught, but do not, because isB was required to take an any.

var x = foo(); // I thought this returned a B
if(isB(x)) { // no compiler error, great?
rotemdan commented 9 years ago

@danquirk @RyanCavanaugh @mhegazy

There are two different elements here, that exist at different levels of the program:

  1. Compile-time check for the static type when given as an input variable to the guard, and since it is done purely at compile-time, it is reasonable to apply the unsound, but useful assumptions at that level, as a first measure of security.
  2. Run-time check that hopefully assumes nothing about the input value.

From the perspective of the compiler, if foo() returned a type inferred as B, then there is no need for a run-time check at all! It could just assume it is B and continue.

That is obviously not the idea here, so during run-time, these assumptions are removed, and the object should be considered a 'blank slate' with no prior assumptions about it.

We need to try to allow the best of both worlds. Allow the compiler to check to the best of its knowledge, but create an environment that constrains the programmer not to apply those assumptions in their tests, because otherwise, the test would not be reliable and would defeat the benefit of having any run-time level of presence or assurance here.

We need to separate between three different scenarios:

  1. The input type is the actual type of the target type.
  2. The input type is a super-type of the target type.
  3. The input type is a union containing the target type.

1, as I've mentioned, it is less of a problem, because it is obvious for the programmer they cannot assume the type they are checking, so the static type is seen as a tool for auto-completion here and this is reasonable.

2 is more problematic, because the programmer is encouraged to accept the unsound compile-time assumption that the input type is already known to be of that super-type. This may yield a dependent guard. I have demonstrated it and its problems above and in the original text.

3 is extremely dangerous, because now the programmer is encouraged to accept an assumption that is very easy to refute, for example, if the input type was SomeObjectType | string, the programmer could easily (and very reasonably will) write something like this:

type SomeObjectType = { func: () => number, prop: Array<boolean> };

function disambiguateSomeObjectAndString(input: SomeObjectType | string): input is SomeObjectType {
    return typeof input === "object";
}

This test is very poor and would pass for everything that is not a primitive or a function. There is no assurance that input has a property func or prop, and calls or references to them would fail unexpectedly.

From your response, it puts to doubt whether it was really the intention of the language designers to encourage the programmer to think this way, so this brings me to a partial relief.

We'll now try to formulate a solution that allows the best of both worlds. Enable the compiler to do its job, i.e. provide it with an appropriate static type during compile-time, and the programmer to make the right choice and not to assume the static type at run-time:

The guard function:

function isA(x): x is A { // note: 'x' is untyped.
    ..
}

Would only be callable with types that have been identified to be assignable either to or from A. I.e. any, a super-type, sub-type or a union containing it. This would be consistent with the input types that are currently allowed for any combination of a guard parameter type and the types that are assignable to it.

The input parameter itself is now required to be untyped, and would even error if it set to be explicitly any. The compiler may still provide auto-completion for x based on the static type of A, for convenience.

This idea looks interesting, and I would like you to consider it or its possible variants deeply and try to give your best insights or suggestions on the matter. If this does seem like a reasonable design, I would also request you to consider it for a future version of TypeScript. It may be a breaking change (though limited only to the guard signature itself, and not its calls), but one that could be immensely effective at improving the quality and safety of all user-defined guards in general.

I find this topic very important and interesting and would like to continue the discussion, even if the pace may be slower and limited to one message per one or several days. Thank you for your time.

[Edit: I corrected a detail and added sub-types as valid argument types, so the text is now different from what appeared at the e-mail notification]

danquirk commented 9 years ago

This test is very poor and would pass for everything that is not a primitive or a function. There is no assurance that input has a property func or prop, and calls or references to them would fail unexpectedly.

There's an unending number of incorrectly authored runtime type tests that we cannot help guard against despite all sorts of type annotations and assignability requirements:

// two types with the same shape but are functionally different and need to be treated differently
type Foo = { func: () => number };
type Foo2 = { func: () => number, other: string; };

function isFoo(input: Foo | string): input is Foo {
    // incorrect, my specific implementation requires this not have an input.other so I don't treat the two types above as the same
    return input.func && input.prop; 
}

Would only be callable with types that have been identified to be assignable either to or from A. I.e. any, a super-type, sub-type or a union containing it. This would be consistent with the input types that are currently allowed for any combination of a guard parameter type and the types that are assignable to it.

The input parameter itself is now required to be untyped, and would even error if it set to be explicitly any. The compiler may still provide auto-completion for x based on the static type of A, for convenience.

Then this wouldn't work

interface A { a: string }
function isA(x): x is A { return x.a; }
function foo<T>(x:T) {
    if(isA(x)) { // would be invalid, T is not assignable to or from A

    }
}

There is no perfect solution. We have to find the point where convenience and usability are balanced against strictness. Certainly there are many places TypeScript could be stricter, in this specific feature and others, but a feature is not graded solely on how strictly it enforces any and all possible requirements.

rotemdan commented 9 years ago

@danquirk

You have made your best to find particular examples where the precise formulation I gave did not work, and present them as if they are unsolvable and are significant barriers for that approach to succeed in general. I'm not sure about your first example as the input type is in the guard itself, i.e. x is A and can always be set there. If you are referring to a form of nominal typing then I don't particularly what you are referring to, because as far as I know TypeScript doesn't support nominal typing.

I don't know the details of the way the compiler instantiates generic variables, but I assume a solution could be found for them, either as an exception or through some other mechanism. In any case, this approach ensures the guard would also handle any types or any other input type in general, the assignability constraint is only for the compiler to apply preliminary type checking at compile time.

If you are putting all the weight on me now to try to find an appropriate solution, then you might as well tell me immediately you have no motivation to do anything about the problem and we all go our ways. TypeScript's type system is very complex. There will be a point I would not be able to help you or other team members and you'd be own your own. That is the point you will always 'win'.

You won. Congratulations.

danquirk commented 9 years ago

This isn't about winning, this is how our design process works whether on GitHub or on a whiteboard in the office. Someone proposes an idea/rule/example and then others (including the original author) try to poke holes in it to see how it holds up under the myriad circumstances that occur when people write code using these features. From there we iterate on changes to the original idea to address those issues, sometimes having to scrap a particular idea/approach entirely. This is precisely the process used to arrive at the current set of rules for the behavior user defined type guards today. So when I point out a hole in a proposal like 'how does this work with generics' it's because that's the process we constantly apply to ideas and have applied to multiple approaches for this one already. In some cases you will see comments to the tune of 'this doesn't work with unless we did x/y/z.' In other cases you will see comments more like the above which are to the tune of 'it's not clear to me how this could work with '.

The point is not to say 'clearly this is forever impossible because of this particular example.' Certainly in most cases at worst you can start by special casing a particular scenario and crafting all sorts of special rules for it. But this has all sorts of undesirable properties. Namely the rules for the language would quickly become an impossible to comprehend mess if we consistently special case things, which impacts both usability for customers as well as our own ability to effectively do good design going forward.

It should be obvious from this thread and others that we have plenty of motivation to discuss ideas and iterate on possible solutions. This thread already has had more discussion from more team members in the last day than it did while it was open, despite the fact that spending time discussing this is time not spent on other work we have committed to and the fact that most of these proposals are breaking changes which means we are even less likely to take them in the future.

RyanCavanaugh commented 9 years ago

If you are putting all the weight on me now to try to find an appropriate solution

Well, yes. All changes start at -100, and breaking changes start at -1,000. If you want to make a breaking change to the language, it should be very compelling.

What I've gathered here is the following (if this is an incorrect summary, please write a correct summary of similar length):

  1. UDTGs ought to be written in a way that they are guaranteed to always be correct, even in the presence of hostile inputs
  2. Forcing UDTGs to accept their parameter as any would be an effective way of advancing that goal.

Both of these points are hard to swallow. The first is not shown to be how people write type guards in practice.

The second, more problematic one, asserts that a programmer will be more likely to write a correct type guard if they are given a value of type any. Someone comes along and says "I will write a type guard for Point!" and then, given a value of type any, writes a better type guard than if it were of some other type? Why? How? What is the mechanism of action that causes people to write better a type guard when given an any instead of some other type? Arguably they are liable to write a worse type guard because it becomes possible to do incorrect things with the value because it's untyped.

What we have is an unproven problem statement (there is no evidence that UDTGs are problematically incorrect in practical use) paired with a solution with no causal mechanism for its desired effect (why would those bad type guards be improved by an input of type any?). That doesn't meet the bar for a new feature, let alone a breaking change.

If you want to claim that we're just declaring a "win" by fiat, that's your option, but it's not the case that we haven't given due consideration to the proposal.

rotemdan commented 9 years ago

@RyanCavanaugh

I've shown a general approach to an alternative solution that does allow a compile-time type-check for guard calls (see this comment), so it isn't much more about requiring type any, more about removing the ability to specify a particular type on the input variable.

I've demonstrated the problems of dependent guards for supertype inputs and highly compromised guards for union inputs. Removing assumptions on the input types may significantly reduce the prevalence of these kind of guards. Although they could still technically be written, they wouldn't be convenient or natural to create, and would not appear as legitimate (the language wouldn't provide a special mechanism to declare a specific type for the input argument so that would reduce motivation and practicality to formulate them this way).

Class embedded guards are fundamentally flawed in the sense that a call to them needs to assure the object is not undefined or null, assume they are contained in some particular class type(s) and include a particular function. This requires additional checks, but can be reformulated in a more efficient way, that would be entirely independent. I demonstrated that in the original text (though a better approach might be possible than the one mentioned).

For some reason I feel that as much as I would try harder to explain and demonstrate things here, I'll be understood less. You said I didn't provide strong enough evidence and that leaves me a bit confused. I guess it is hard to communicate things to unmotivated or even hostile people though that's understandable as it may expose a problem or a flaw in their work. This puts me in a difficult position myself.

mhegazy commented 9 years ago

Let's keep this focused on technical discussions and not about people. @rotemdan, I believe you have sufficiently represented your proposal. If there are no new scenarios or use-cases to cite, let's agree to disagree and leave it at this point.