microsoft / pyright

Static Type Checker for Python
Other
13.43k stars 1.47k forks source link

Unknown return type when using method on Union of generics #5428

Closed charbonnierg closed 1 year ago

charbonnierg commented 1 year ago

Describe the bug

I'm trying to implement a simple Either[T, E] type annotation which is an alias for Left[T] | Right[E]. I'd like to write code closer to functional programming, but I cannot get Pylance in VSCode to display same type annotations than mypy.

The code is as follow:

from __future__ import annotations

import typing as t
import typing_extensions as t_

T = t.TypeVar("T")
E = t.TypeVar("E")
U = t.TypeVar("U")
F = t.TypeVar("F")

Either: t.TypeAlias = t.Union["Left[T]", "Right[E]"]

class Left(t.Generic[T]):
    def __init__(self, value: T) -> None:
        self.value = value

    def map_left(self, fn: t.Callable[[T], U]) -> Left[U]:
        return Left(fn(self.value))

    def map_right(self, fn: object) -> t_.Self:
        return self

class Right(t.Generic[E]):
    def __init__(self, value: E) -> None:
        self.value = value

    def map_left(self, fn: object) -> t_.Self:
        return self

    def map_right(self, fn: t.Callable[[E], F]) -> Right[F]:
        return Right(fn(self.value))

To Reproduce

def func() -> Either[int, str]:
    raise NotImplementedError

result = func().map_left(lambda lv: lv + 1).map_right(lambda rv: rv + "a")
t_.reveal_type(result)
# Revealed type is "Union[demo.Left[builtins.int], demo.Right[builtins.str]]" (Mypy)
# Type of "result" is "Left[int] | Right[Unknown]" (Pylance)

Expected behavior

💥 I expect result to be Left[int] | Right[str] just like mypy indicates, instead of Left[int] | Right[Unknown]

✅ I expect lv parameter in first lambda function to be of type int (this is what pylance displays on VSCode)

💥 I expect rv parameter in second lambda function to be of type str (pylance displays Unknown)

Code or Screenshots

Screenshot obtained using code above: image

VS Code extension or command-line

I'm running VSCode with Python extension (v2023.11.11841013), Pylance extension (v2023.6.42) and Mypy Typechecker (v2023.1.11781017)

GugelRobin commented 1 year ago

It seems like pyright uses some greedy method to infer the type of rv. It sees that map_right of Left accepts an object and therefore assigns Unknown to the type of rv and stops there. Therefore the type of the lambda passed to map_right is just (Unknown) -> Unknown and the return type of map_right is Left[int] | Right[Unknown]. As Unknown is just an alias for Any passing (Unknown) -> Unknown to a parameter accepting (str) -> T is valid and T is bound to Unknown.

If you switch Left and Right in Either (and return Either[str, int]) you'll see that result is Right[str] | Left[Unknown]

You can implement Either as a single class for now to mitigate that issue


T = t.TypeVar("T")
E = t.TypeVar("E")
U = t.TypeVar("U")

class Either(t.Generic[T, E]):

    value: T | E
    _left: bool
    def __init__(self, *, __value: T | E, __left: bool) -> None:
        self.value = __value
        self._left = __left

    def map_left(self, fn: t.Callable[[T], U]) -> Either[U, E]:
        if self._left:
            return Either.left(fn(self.value)) # type: ignore
        return self # type: ignore

    def map_right(self, fn: t.Callable[[E], U]) -> Either[T, U]:
        if not self._left:
            return Either.right(fn(self.value)) # type: ignore
        return self # type: ignore

    @classmethod
    def left(cls, value: T) -> "Either[T, t.Any]":
        return cls(__value=value, __left=True)

    @classmethod
    def right(cls, value: E) -> "Either[t.Any, E]":
        return cls(__value=value, __left=False)
charbonnierg commented 1 year ago

Thanks for the feedback. Indeed I can do almost the same thing using a single class, but there are still patterns which I don't see how to implement using a single class.

Take this one as example:

from __future__ import annotations

import typing as t
from typing_extensions import TypeAlias

T = t.TypeVar("T")
E = t.TypeVar("E")

class Left(t.Generic[T]):
    def __bool__(self) -> t.Literal[True]:
        return True

class Right(t.Generic[E]):
    def __bool__(self) -> t.Literal[False]:
        return False

Result: TypeAlias = "Left[T] | Right[E]"

def func(ok_t: t.Type[T], err_t: t.Type[E]) -> Result[T, E]:
    raise NotImplementedError

if x := func(int, Exception):
    reveal_type(x)
    # Type of "x" is "Left[int]" (Pylance)
else:
    reveal_type(x)
    # Type of "x" is "Right[Exception]" (Pylance)

I don't think I can do that using a single class ?

Anyway, thanks for the tip and the explanation

EDIT: It can be done using child classes and a TypeAlias:

from __future__ import annotations

import abc
import typing as t
from typing_extensions import reveal_type

T = t.TypeVar("T", covariant=True)
E = t.TypeVar("E", covariant=True)
U = t.TypeVar("U")
F = t.TypeVar("F")

class _EitherABC(t.Generic[T, E], metaclass=abc.ABCMeta):
    def __init__(self, *, value: T | E, err: bool) -> None:
        self._value = value
        self._left = not err

    def map_ok(self, fn: t.Callable[[T], U]) -> Either[U, E]:
        if self._left:
            return Left(fn(self._value))  # type: ignore[arg-type]
        return self  # type: ignore[return-value]

    def map_err(self, fn: t.Callable[[E], F]) -> Either[T, F]:
        if not self._left:
            return Right(fn(self._value))  # type: ignore[arg-type]
        return self  # type: ignore[return-value]

class Left(_EitherABC[T, E]):
    _value: T

    def __bool__(self) -> t.Literal[True]:
        return True

    def __init__(self, value: T) -> None:
        super().__init__(value=value, err=False)

    def unwrap(self) -> T:
        return self._value

class Right(_EitherABC[T, E]):
    _value: E

    def __init__(self, value: E) -> None:
        super().__init__(value=value, err=True)

    def __bool__(self) -> t.Literal[False]:
        return False

    def unwrap(self) -> E:
        return self._value

Either: t.TypeAlias = "Left[T, E] | Right[T, E]"

def func(result: Either[int, str]) -> None:
    value = result.map_ok(lambda x: x + 1).map_err(lambda msg: msg.capitalize())
    reveal_type(value)
    # note: Revealed type is "Union[demo.Left[builtins.int, builtins.str], demo.Right[builtins.int, builtins.str]]" (Mypy)
    # note: Type of "value" is "Left[int, str] | Right[int, str]" (Pylance)
    if value:
        success_value = value.unwrap()
        reveal_type(success_value)
        # note: Revealed type is "builtins.int" (Mypy)
        # note: Type of "success_value" is "int" (Pylance)

    else:
        error_value = value.unwrap()
        reveal_type(error_value)
        # note: Revealed type is "builtins.str"
        # note: Type of "error_value" is "str" (Pylance)

Types are resolved correctly by both Pylance and mypy, so all my problems have been solved.

I let the ticket open for maintainers to indicate whether the current behaviour is "as expected" or not.

erictraut commented 1 year ago

Here's what's happening in this case. Pyright aggressively caches types that it has already evaluated for specific parse nodes. It has a special mechanism called "speculative execution" whereby types are not cached, but this is used sparingly because it can result in significant performance issues when evaluating deeply-nested expressions. The same cache is used by the language server when displaying information such as hover text for identifiers.

Let's look at a simplified version of your repro case:

c = func().map_right
reveal_type(c) # (fn: object) -> Left[int]) | (fn: Callable[[str], F]) -> Right[F])

c(lambda rv: rv + "a")

The type of c is a union of two callables, so when evaluating the call expression, pyright loops through each of the subtypes in the union and evaluates the type of each return value, then combines them into a union. It uses bidirectional type inference to infer the type of the lambda expression. In this case, the expected type is object the first time through the loop. This isn't very useful for purposes of inference. This results in the lambda being inferred as (Unknown) -> Unknown. This type is then cached. The next time through the loop, the type of the lambda is not re-evaluated because the type is cached.

This limitation has never been reported as a problem before. The code in the sample above is quite unusual in that the two types (Left and Right) define their methods map_left and map_right with signatures that differ in a significant manner. If you change the annotation on the fn parameter to fn: Callable[[Any], Any], it unifies the signatures, and the problem no longer occurs.

So, while I consider this a bug, it does represent a pretty significant edge case — and one that is easy to work around in your code.

To fix this in the type evaluator, pyright needs to avoid caching the resulting type until the last iteration of the loop. I've impelmented this solution, which addresses the problem in the code above. However, this lack of caching impacts analysis performance any time that a call expression involves a union type, so I'm a bit uncomfortable with the solution. I've run some benchmarks on some typed code bases, and it hasn't resulted in any measurable performance degradation, but there are theoretically cases where it could have a significant impact. If a real-world performance issue is reported, I may consider reverting the change.

This change will be included in the next release of pyright.

erictraut commented 1 year ago

This is addressed in pyright 1.1.319, which I just published. It will also be included in a future release of pylance.