python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.36k stars 2.82k forks source link

Type inference engine simplifies too aggressively #3178

Open pkch opened 7 years ago

pkch commented 7 years ago

This issue is broader than, but is closely related to, #3032.

Consider this code:

from typing import TypeVar, Callable

class Summable: ...
class MySummable(Summable): ...
T = TypeVar('T', bound=Summable)
def plus(a: T, b: T) -> T: ...

def plus_partial(t: T) -> Callable[[T], T]:
    def f(t1: T) -> T:
        return plus(t1, t)
    return f

my_s: MySummable
plus_my_s = plus_partial(my_s)

What should mypy infer for the type of plus_my_s?

The constraints are simple: T must be subtype of Summable because we use it in plus; and T must be supertype of MySummable because plus_partial has to accept my_s as an argument. So T has lower bound of MySummable and upper bound of Summable; any type within this range is fine.

Here, there's no such thing as the "most informative" ("optimal") type, since in the expression that plus_my_s is bound to , T appears both as a return type (covariant position) and as an argument type (contravariant). IOW, sometimes it's better to have T lower in the type hierarchy, sometimes higher. Therefore, mypy inference engine should not simplify the constraint any further, and just store the constraint itself internally.

However, mypy infers T == MySummable (and so, the type for plus_my_s is inferred as def (MySummable*) -> MySummable*). As a result, the following perfectly safe code fails type check:

s: Summable
plus_my_s(s)  # E: Argument 1 has incompatible type "Summable"; expected "MySummable"

even though direct use of plus would (of course) be accepted:

plus(my_s, s)  # ok

This could be fixed by supporting (at least internally) types with both upper and lower bounds (rather than only upper bound). A very similar problem came up in isinstance inference (#2993), which I fixed in #2997 by defining TypeRange that can store either a constraint with an upper bound, or a precise type.

Alternatively, this could be fixed by refusing to infer types in this situation and asking the user to provide annotation (although, the user is then out of luck if he actually wants to use partial_plus with both Summable and its subtypes - since we don't let users provide lower bounds for type variables).

gvanrossum commented 7 years ago

I have a curious inability to understand general rules stated without concrete examples. In this case I could only understand your point after in my head I replaced Summable with float and MySummable with int (and assuming that int is a subtype of float). Then we see plus(int, int) -> int and plus(float, float) -> float and plus(1, 1.0) -> 2.0 (using a very informal notation). So the idea is that plus_partial(1) returns a variant of plus() that adds 1 to something, and you are arguing that plus_partial(1)(1.0) should be valid. With this I agree. But it's also easy to work around, by writing plus_partial(1.0) -- translating that back to your example it would be plus_partial(cast(Summable, my_s)). I don't know how easy the fix would be, but @JukkaL and/or @ilevkivskyi might.

pkch commented 7 years ago

Agreed. I'm not concerned that there's a hard use case without a workaround.

I just noticed that this oversimplification occasionally emerges in various forms, e.g., the undesirable behavior (#3032, #2993 ), or the one-off fixes (checker.TypeRange in #2997).

If it's worth / feasible solving at its root by extending the type inference logic somewhat, that's good. If not, then we can just deal with specific issues as they are discovered.

Btw, I used Iterable and List in my head for the same reason, but was afraid to bring in even more generics into the example.

ilevkivskyi commented 7 years ago

It looks like replacing an inferred type with a TypeRange (or similar) could be a significant change, but it should not be too difficult, since it could be "localized" in few files.

There is actually something that I would like to change also in constraint inference: constrains for instance type variables are currently inferred ignoring declared variance and assuming invariance.

JukkaL commented 7 years ago

If I understood this correctly, the idea would be to infer a generic callable type for plus_my_s, and the type variable for this callable would have both an upper and a lower bound?

Here is another, slightly simpler example:

from typing import TypeVar

T = TypeVar('T')

def f(c: T) -> Callable[[T], T]:
    return ...  # how this is implemented is not important here

g = f(1)
reveal_type(g(1))  # ok, type is int
reveal_type(g(object())  # ok, type is object
g('x')  # error

It's not immediately clear to me how this would help solve some of the other issues such as #3032 and #2993. A more detailed explanation might help. Also, I'd like to see some real-world code where this would make a difference.

pkch commented 7 years ago

Yes, your example precisely describes what I am proposing. However, I wouldn't limit this to type variables - I'd allow any inferred type to be a range of types with upper and lower bounds.

Due to subtyping, we already allow ranges, but only with an upper bound. Therefore, we can represent every type with its upper bound. The lower bound is always (implicitly) bottom or <uninhabited>.

So my proposal is to add a lower bound to each type inferred by the inference engine.

The current incarnation of TypeRange is a step in that direction, but it only allows the lower bound to be either bottom or equal to the upper bound (that's all I needed for that particular use case). Of course, all it takes to represent the arbitrary type range is to increase the amount of information stored in TypeRange: from the current "upper bound plus 1 bit" (TypeRange.is_upper_bound is a boolean) to "upper bound + lower bound".

Of course, the hard part is not representation, but updating all the logic to work correctly for types with a lower bound. I think though that it will makes the code cleaner / less awkward, and fix some known and unknown bugs.

The reason this will solve #3032 is that currently <uninhabited> is produced by the constraint solver in a attempt to represent completely unconstrained type as a single type. Once it is allowed to return a type range, it can simply return a logical and simple type range, bounded by <uninhabited> from below and object from above.

The #2993 is already fixed by #2997, but in that fix I created the TypeRange construct, which is just a restricted version of what I'm proposing here.

I also think #2008 would benefit from this because the main downside of Proposal 3: always infer a bound type on assignment will disappear when the bound type is allowed to have upper and lower bounds.

JukkaL commented 7 years ago

An early predecessor to mypy actually used ranges of types in inference, but it caused enough headache with various corner cases that I moved to the current, simpler approach. Your proposal is more general than what I was doing and I suspect that it would be even harder to implement. The type system was also much simpler back then, but it was still a pain to get right.

So basically I'm saying that this is unlikely to happen, but I'd still like to understand what cases this could help with -- maybe we can figure out easier ways to get them to work better. Without concrete real-world examples we likely won't reach very useful conclusions, though.

I agree that the TypeRange construct is related to this idea, but since its implementation is pretty trivial and we use it only in a very narrowly defined contexts, it really can't be used as motivation for this proposal, I think, and this has to stand on its own merits.