python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.24k stars 2.79k forks source link

Narrow types using overload information #4063

Open elazarg opened 6 years ago

elazarg commented 6 years ago

mypy can use NoReturn to narrow types of variables, thus enabling us to express some kinds of guards, in the sense of #1203 / #2357, without using any dedicated mechanism. For example

from typing import Union, NoReturn, overload

@overload
def assert_int(x: int) -> None: ...
@overload
def assert_int(x: str) -> NoReturn: ...

def assert_int(x):
    assert isinstance(x, int)

y: Union[int, str]
assert_int(y)
reveal_type(y)  # should be: int; currently Union[int, str]

The same technique could be used to narrow types in conditionals, using only "always true" / "always false" types (do we already have such things? maybe it requires singleton True False literal-types).

Note that unlike simple isinstance, this can also be used to correlate two different arguments, test for generic compatibility, etc; it is the responsibility of the programmer to have an actual implementation.

ethanhs commented 6 years ago

Another use case brought up by @timabbott in #5088 is for assertisNotNone, which, like many methods of TestCase, could be useful for type narrowing. I think this could be quite useful, especially once we have literal types.

@Michael0x2a since you have been working on overloads recently, do you have thoughts on this?

Michael0x2a commented 6 years ago

@ethanhs: I agree this feature could be pretty helpful -- I think we might actually be pretty close to implementing this, albeit with some limitations.

For example, we can make @elazarg's original example to work if we use my (pending) PR and modify the example a little bit:

from typing import Union, NoReturn, overload

@overload
def assert_int(x: int) -> int: ...
@overload
def assert_int(x: str) -> NoReturn: ...
def assert_int(x):
    assert isinstance(x, int)
    return x

y: Union[int, str]
y = assert_int(y)  # Type is narrowed!
reveal_type(y)  # Revealed type is 'int'

Basically, this works as an unintended side-effect of the union-math feature: Union[int, str] doesn't match either arm of the overload but does match the union of the two arms. So, my PR will infer the return type is Union[int, NoReturn] which simplifies to int, narrowing the type.

The main caveat is that support for union-math is currently extremely simple which means it's unclear if we can support more sophisticated type guards. We can probably fix this in the future by just implementing better heuristics, though.

We can also probably make the original assignment-less form work if we made the binder understand calls to overloaded functions in some fashion. I have no idea how hard that would be, but my intuition is that it wouldn't be too bad. We can already flag unreachable blocks/narrow assignments, and this feels morally similar.

Making #5088 work is harder: the root issue is that the union-math feature kicks in only when neither overload alternative matches the passed in argument. Unfortunately, I don't think it's possible to make an overload branch match literally anything other then None -- there was some discussion about this earlier in https://github.com/python/mypy/pull/3763.

Tactics for fixing this include:

@ilevkivskyi -- thoughts? We were talking about the object vs None thing earlier -- I was hoping we could just special-case descriptors but maybe that might have been too optimistic.

ilevkivskyi commented 6 years ago

It looks like this is a trade-off between a false negative and a false positive (from a typical user's point of view). I would say false negative is better. So I would special case None in overloads, so that None does not overlap with any types except Optional[X] for all X. As discussed before, we probably need to do this independently of --strict-optional.

Michael0x2a commented 6 years ago

Actually, never mind -- I think we can make #5088 work w/o having to do anything funny with None and object, at least for people who are using strict-optional.

@JelleZijlstra comment in gitter the other day made me realize we could probably get away with doing union-math first, instead of second. This means if the user passes in Optional[Foo], the type won't be greedily bound to the second alternative unless the union math stuff failed. I think this ought to be typesafe -- at the very least, it doesn't seem to be breaking any tests.

I also tried testing the "make None disjoint from object" approach. I think the edge case I ran into was that there are cases where people sometimes do want object to overlap with None: for example, if they want a fallback alternative or if they're using an unrestricted typevar (which I believe has an implicit bound of 'object'?). It's also possible my experiment was just buggy, so take this observation with a grain of salt.

gvanrossum commented 6 years ago

But isn't the problem that it gives an error if you're not using strict optional?

Michael0x2a commented 6 years ago

I think there are two separate issues.

The first one was from yesterday: the checks for overload definitions were too strict when strict-optional was disabled. I ended up just always enabling strict-optional when checking definitions, which made mypy behave like you suggested.

The second issue came up when I was looking at #5088. I first thought this also had to do with object and None, but I think that was a red herring in retrospect: the real issue is that the "pick the first match" rule is too aggressive with unions. For example, this snippet has the same problem:

class Parent: ...
class A(Parent): ...
class B(Parent): ...
T = TypeVar('T', bound=Parent)

@overload
def narrow_to_not_a(x: A) -> NoReturn: ...
@overload
def narrow_to_not_a(x: T) -> T: ...

val: Union[A, B]
val = narrow_to_not_a(val)
reveal_type(val)   # Ideally should be 'B', but ends up being 'Union[A, B]'

Currently the union will greedily match the second alternative instead of doing union math and inferring Union[NoReturn, B], preventing the type from narrowing. I previously thought the only way of fixing this was to somehow convince mypy to not match the second alternative (which would make it fall back to doing union math). But we can probably just flip the order, like I described above.

I haven't pushed this change to my PR though -- I'm still testing it.

danihodovic commented 5 years ago

I'm here from #5088 and using self.assertIsNotNone(var) would be useful in tests. Until then I'll have to disable type checking for my test files.

kkozmic commented 3 years ago

Ditto on self.assertIsNotNone(var). Are there any plans to add that in the relatively near future?

ShaneHarvey commented 2 years ago

I'm also checking in from https://github.com/python/mypy/issues/5088 to say this feature would be very useful. Type checking test suites is currently very painful since all assertIsNotNone/assertIsInstance/etc.. calls need to be duplicated with standard assert statements to avoid false positives from mypy. Is there any update here?

1kastner commented 2 years ago

Well, after I have checked several places and had several very insightful discussions, there are updates, @ShaneHarvey.

At https://github.com/python/typeshed/issues/8583, I asked the typeshed team whether the stubs could be updated as proposed by @elazarg. During discussion, I was taught that mypy generally doesn't typecheck code it considers unreachable. This could happen when there is no static type information available for a variable - something completely legal, as type hints are always optional! Then, the wrong path (failing the assertion) is automatically chosen. At that point, this change would have unwanted side-effects because the remainder of the method would not be checked unless the flag --warn-unreachable is set. That, however, is not backwards-compatible. Thus, we cannot narrow types using overload information without potentially breaking someone's project at this moment.

There is some hope there. At https://github.com/python/typing/issues/930, a new feature is discussed that might help us out here. The proposed NoReturnGuard informs the type checker that if the function or method returns, then either the type is narrowed down or an exception has happened. However, the type checker does not try to guess which path will been taken, e.g., in a unittest. This is semantically much closer to the assert statement!