microsoft / pyright

Static Type Checker for Python
Other
13.12k stars 1.4k forks source link

incomplete type inference in match-case statement #8789

Closed 124C41p closed 3 weeks ago

124C41p commented 3 weeks ago

In the following code snippet, pylance is able to infer the type of a but not of b. Clearly both variables should be recognized as int. Instead, when hovering over b, pylance shows b: Any.

class Foo:
    x: int
    y: int

    def show(self):
        match self:
            case Foo(x=a):
                print(f"{a=}")
            case Foo(y=b):
                print(f"{b=}")

Environment data

erictraut commented 3 weeks ago

Could someone from the pylance team please transfer this to the pyright project? Thanks!

erictraut commented 3 weeks ago

Looking at this closer, pyright's behavior is correct here. There is a bug in the code sample. The second case statement will never be reached because the first pattern (Foo(x=a)) will match all values of the subject expression self. That means the narrowed type of self after the first case statement is Never. The second pattern will never be matched, so the variable b will never receive a value.

If you enable type checking — in particular, the reportUnnecessaryComparison check, pyright will report this bug in your code.

Code sample in pyright playground

124C41p commented 3 weeks ago

Looking at this closer, pyright's behavior is correct here. There is a bug in the code sample. The second case statement will never be reached because the first pattern (Foo(x=a)) will match all values of the subject expression self. That means the narrowed type of self after the first case statement is Never. The second pattern will never be matched, so the variable b will never receive a value.

The second pattern is reachable indeed. Consider this snippet:

foo = Foo()
foo.y = 1
foo.show()

This will print b=1 to the standard output.

erictraut commented 3 weeks ago

From a type analysis perspective, the second case statement is unreachable. You've declared that instance variables x and y contain values of type int. In your snippet above, you haven't initialized these instance variables according to the type declarations.

Type analyzers need to base their analysis on certain assumptions. One such assumption is that instance variables with declared types have been initialized with values that match those types. If a type analyzer like pyright were to assume that all instance variables with declared types were potentially uninitialized, it would result in many false positive errors, so this wouldn't be practical.

If your intent is for x and y to contain either an int value or some other value like None, you could define them this way:

class Foo:
    x: int | None = None
    y: int | None = None

    def show(self):
        match self:
            case Foo(x=int() as a):
                print(f"{a=}")
            case Foo(y=int() as b):
                print(f"{b=}")
124C41p commented 3 weeks ago

Type analyzers need to base their analysis on certain assumptions. One such assumption is that instance variables with declared types have been initialized with values that match those types.

I agree, but I don't think that my example should necessarily be regarded a contradiction to this statement. I would argue that from the point of view of a type checker, foo.x is of type Never which is a sub-type of int.

In the same way, pyright will (correctly) infer in the following example, that z has type int:

def bar() -> int:
    raise NotImplementedError

z = bar()

Please also note that the pattern I sketched in the opening post is quite useful to model union types in Python. For example, the betterproto project does it that way. It would be a shame to have to go without type checking when using this pattern..

124C41p commented 3 weeks ago

@erictraut Please consider reopening this issue to give other collaborators the chance to think about it. I am quite certain there would be different opinions among them.

erictraut commented 3 weeks ago

Pyright is working as intended here, and I don't consider this a bug. Changing the behavior as you suggest would have significant ramifications for type evaluation and type checking, including the introduction of many false positive errors. This wouldn't be a good tradeoff for pyright users.

I your example above, you said "from the point of view of a type checker, foo.x is of type Never". That represents a misunderstanding. The type Never has a very specific meaning in the Python type system. You can read about it here. It is not the same as an unbound variable.

In the example you provided above (with bar), there is no type inference. Pyright is following standard type evaluation rules here. In your original code sample, pyright is performing type narrowing based on local code flow analysis. These are different concepts.

124C41p commented 3 weeks ago

@erictraut Do you have an example in mind demonstrating such ramification issues? In my mind, I am only suggesting to give meaningful type information in a situation where the type Any is inferred. I think there shouldn't be any ambiguity about the actual type in this situation. But of course, I might be missing something here..

I might have used incorrect terminology in my last post. My point is that I don't see a difference between calling a function (bar) which claims to return an int but actually raises an error, and accessing an object's attribute (foo.x) which claims to be an int but actually raises an error. The latter case should technically also be just a function call to __getattribute__ if I am not mistaken.

erictraut commented 3 weeks ago

Consider the following code:

class Foo:
    x: int

class Bar:
    x: int

def func(subj: Foo | Bar) -> int:
    match subj:
        case Foo(x=a):
            return a
        case Bar(x=b):
            return b

Here's how pyright analyzes this code. The type of subj is initially Foo | Bar at the top of the match statement. The first case statement matches all Foo instances, so the type is narrowed to just Bar in the fall-through case. The second case statement matches all Bar instances, so the type is narrowed to Never in the fall-through case. This type narrowing allows pyright to determine that the match is exhaustive and "prove" that there is no way to fall off the bottom of the match statement and implicitly return None.

If I were to change pyright's behavior as you're suggesting, to assume that an exception might occur when accessing the attribute x and therefore the class pattern matches might fail, it would no longer narrow the type of subj in these cases. Pyright would then report that the match statement isn't exhaustive and it would report a return type violation for function func. Most developers would see these both as false positive errors, and I'd agree with them. Pyright users rely on the current behavior, and changing it as you suggest would be seen as a regression.

As I mentioned above, static type checkers need to make certain assumptions. One of those assumptions is that an instance variable is bound to a value that is compatible with its declared type. If it cannot make that assumption, then every access to every instance variable would need to be reported as a potential error — something that could raise an exception at runtime. This assumption turns out to be correct for the vast majority of Python code. Pyright even has a way to help enforce this, an option called "reportUninitializedInstanceVariable".

Not surprisingly, other Python static type checkers like mypy make this same assumption and produce the same results in your original code sample above. Like pyright, mypy determines that your second case statement is unreachable because the type of the subject expression has been narrowed to Never at that point in the code flow graph.

124C41p commented 3 weeks ago

@erictraut Thanks a lot for taking the time to give such a detailed explanation. It really helped me to understand the dilemma.

So for being able to make decent predictions on exhaustiveness, type checkers (like pyright and others) have to discourage the usage of the pattern discussed here. While it might certainly be technically possible to improve type hints in this situation, it would also be inconsequential to give partial support for an unsupported pattern. This is a fair point.