Closed 124C41p closed 3 weeks ago
Could someone from the pylance team please transfer this to the pyright project? Thanks!
Looking at this closer, pyright's behavior is correct here. There is a bug in the code sample. The second case statement will never be reached because the first pattern (Foo(x=a)
) will match all values of the subject expression self
. That means the narrowed type of self
after the first case statement is Never
. The second pattern will never be matched, so the variable b
will never receive a value.
If you enable type checking — in particular, the reportUnnecessaryComparison
check, pyright will report this bug in your code.
Code sample in pyright playground
Looking at this closer, pyright's behavior is correct here. There is a bug in the code sample. The second case statement will never be reached because the first pattern (
Foo(x=a)
) will match all values of the subject expressionself
. That means the narrowed type ofself
after the first case statement isNever
. The second pattern will never be matched, so the variableb
will never receive a value.
The second pattern is reachable indeed. Consider this snippet:
foo = Foo()
foo.y = 1
foo.show()
This will print b=1
to the standard output.
From a type analysis perspective, the second case statement is unreachable. You've declared that instance variables x
and y
contain values of type int
. In your snippet above, you haven't initialized these instance variables according to the type declarations.
Type analyzers need to base their analysis on certain assumptions. One such assumption is that instance variables with declared types have been initialized with values that match those types. If a type analyzer like pyright were to assume that all instance variables with declared types were potentially uninitialized, it would result in many false positive errors, so this wouldn't be practical.
If your intent is for x
and y
to contain either an int
value or some other value like None
, you could define them this way:
class Foo:
x: int | None = None
y: int | None = None
def show(self):
match self:
case Foo(x=int() as a):
print(f"{a=}")
case Foo(y=int() as b):
print(f"{b=}")
Type analyzers need to base their analysis on certain assumptions. One such assumption is that instance variables with declared types have been initialized with values that match those types.
I agree, but I don't think that my example should necessarily be regarded a contradiction to this statement. I would argue that from the point of view of a type checker, foo.x
is of type Never
which is a sub-type of int
.
In the same way, pyright will (correctly) infer in the following example, that z
has type int
:
def bar() -> int:
raise NotImplementedError
z = bar()
Please also note that the pattern I sketched in the opening post is quite useful to model union types in Python. For example, the betterproto project does it that way. It would be a shame to have to go without type checking when using this pattern..
@erictraut Please consider reopening this issue to give other collaborators the chance to think about it. I am quite certain there would be different opinions among them.
Pyright is working as intended here, and I don't consider this a bug. Changing the behavior as you suggest would have significant ramifications for type evaluation and type checking, including the introduction of many false positive errors. This wouldn't be a good tradeoff for pyright users.
I your example above, you said "from the point of view of a type checker, foo.x
is of type Never
". That represents a misunderstanding. The type Never
has a very specific meaning in the Python type system. You can read about it here. It is not the same as an unbound variable.
In the example you provided above (with bar
), there is no type inference. Pyright is following standard type evaluation rules here. In your original code sample, pyright is performing type narrowing based on local code flow analysis. These are different concepts.
@erictraut Do you have an example in mind demonstrating such ramification issues? In my mind, I am only suggesting to give meaningful type information in a situation where the type Any
is inferred. I think there shouldn't be any ambiguity about the actual type in this situation. But of course, I might be missing something here..
I might have used incorrect terminology in my last post. My point is that I don't see a difference between calling a function (bar
) which claims to return an int
but actually raises an error, and accessing an object's attribute (foo.x
) which claims to be an int
but actually raises an error. The latter case should technically also be just a function call to __getattribute__
if I am not mistaken.
Consider the following code:
class Foo:
x: int
class Bar:
x: int
def func(subj: Foo | Bar) -> int:
match subj:
case Foo(x=a):
return a
case Bar(x=b):
return b
Here's how pyright analyzes this code. The type of subj
is initially Foo | Bar
at the top of the match
statement. The first case statement matches all Foo
instances, so the type is narrowed to just Bar
in the fall-through case. The second case statement matches all Bar
instances, so the type is narrowed to Never
in the fall-through case. This type narrowing allows pyright to determine that the match is exhaustive and "prove" that there is no way to fall off the bottom of the match statement and implicitly return None
.
If I were to change pyright's behavior as you're suggesting, to assume that an exception might occur when accessing the attribute x
and therefore the class pattern matches might fail, it would no longer narrow the type of subj
in these cases. Pyright would then report that the match statement isn't exhaustive and it would report a return type violation for function func
. Most developers would see these both as false positive errors, and I'd agree with them. Pyright users rely on the current behavior, and changing it as you suggest would be seen as a regression.
As I mentioned above, static type checkers need to make certain assumptions. One of those assumptions is that an instance variable is bound to a value that is compatible with its declared type. If it cannot make that assumption, then every access to every instance variable would need to be reported as a potential error — something that could raise an exception at runtime. This assumption turns out to be correct for the vast majority of Python code. Pyright even has a way to help enforce this, an option called "reportUninitializedInstanceVariable".
Not surprisingly, other Python static type checkers like mypy make this same assumption and produce the same results in your original code sample above. Like pyright, mypy determines that your second case statement is unreachable because the type of the subject expression has been narrowed to Never
at that point in the code flow graph.
@erictraut Thanks a lot for taking the time to give such a detailed explanation. It really helped me to understand the dilemma.
So for being able to make decent predictions on exhaustiveness, type checkers (like pyright and others) have to discourage the usage of the pattern discussed here. While it might certainly be technically possible to improve type hints in this situation, it would also be inconsequential to give partial support for an unsupported pattern. This is a fair point.
In the following code snippet, pylance is able to infer the type of
a
but not ofb
. Clearly both variables should be recognized asint
. Instead, when hovering overb
, pylance showsb: Any
.Environment data