More consistency in inference rules between type checkers

matangover commented 1 year ago

I've recently added type annotations to a large library, and have been checking my code using both mypy and pyright. While doing so, I noticed many differences between mypy and pyright in the types they choose to infer. Each type checker has a justification for its choices, but as a user this situation is frustrating, because I rely a lot on inference. This leads to a situation where I frequently had to think about "what pyright would do" and "what mypy would do", and scour their issue trackers to understand what's going on - what's a bug and what's a "feature".

I totally understand that each type checker has been developed independently and influenced by different needs and design choices. I also understand that the ecosystem is very much in flux. I have seen authors of type checkers justify their choices - and rightfully so. Nonetheless I think it would be a big benefit to the community to specify type inference rules more fully (PEP?). If typing is seen as part of the Python language (in various PEPs), and type inference is seen as a feature of typing, then that feature should behave consistently.

I assume there would be much work to define inference rules and reach an agreement that works for all type checkers. Also it would probably require considerable work to implement the necessary changes. However, I believe in the long run it's beneficial. Especially since as time goes by, more backwards compatibility concerns will just pile up.

Areas where I have noticed considerable differences include redefinitions (mypy's allow_redefinition only partially consistent with pyright default), Literals, union vs join, and overloads. Maybe others I can't recall.

I'd like to hear what others think about this topic.

matangover commented 1 year ago

type inference is seen as a feature of typing

You may say - type inference is a feature of the type checker. However, I think that from a user's perspective, this is just not the case. Inference should be part of the language, just like the rest of the typing features (which are consistent across type checkers and specified in PEPs).

matangover commented 1 year ago

P.S. If a full specification is difficult or problematic, another option is to at least make a list of differences for users to consult. I think such a list could also help in discussing the situation and achieving more consistency eventually.

erictraut commented 1 year ago

Most code bases are type checked with only one type checker. It's quite rare for code bases to use more than one, so I don't think this is a widespread problem.

It would be a massive amount of work to debate the merits of every inference behavior and reach consensus on a common behavior. There are literally hundreds — and probably even thousands — of such individual decisions involved in type inference. Even if we could achieve consensus, it would be very painful for any of the major type checkers to modify their inference behaviors as it would cause massive compatibility issues for those who are using those type checkers for existing code bases. This would cause significant code churn for little or no benefit to those code base owners.

To achieve consistent type evaluation behaviors across all type checkers, you'd need to do much more than agree on inference behaviors. You'd also need to agree on bidirectional (context-based) type inference behaviors, type narrowing behaviors, override resolution behaviors, constraint solver behaviors, and more.

As a maintainer of pyright, I don't feel very motivated to participate in such an exercise when the cost would be so high (we're talking about many hundreds of hours of discussions, debates, presentations, compromises, etc.) that would result in significant pain and little or no benefit for most users.

Perhaps there are a few specific behaviors that we could target that would lessen your pain if you decide to continue to use both mypy and pyright. In my experience, there are several primary sources of inconsistent type evaluation:

Pyright treats variable type annotations as declarations. When a variable's type is declared, that type is enforced for all assignments. When a variable's type isn't declared, it infers its type based on all assignments. By contrast, mypy has two different behaviors depending on whether "allow-redefinition" is used, and both of those behaviors are inconsistent with pyright's. When "allow-redefinition" is false (the default behavior), mypy typically (but with some exceptions) treats the first assignment of a variable as a sort of "implied declaration", and it takes its declared type from the inferred type of the first assigned value. When "allow-redefinition" is true, mypy's behavior is closer to pyright's, but it allows a variable to have more than one explicit type declaration, even if they're incompatible. Pyright's behavior is more consistent, easier for developers to understand, leads to fewer false positives, and eliminates the need for many explicit annotations that are required by mypy. I'd love to see mypy adopt pyright's behavior here, but I'm guessing it would be a big change, so there's probably not much appetite for it.
Pyright applies type narrowing for all variable assignments regardless of whether the assignment is on the same line as a variable type annotation. Mypy currently has different behavior when a value is assigned on the same line as the type annotation. The mypy maintainers are planning to change this behavior and eliminate this inconsistency. It's not clear whether this will happen for mypy 1.0, since it's currently on the "less important things" list.
Pyright uses a union operator rather than a join operator for all operations that involve merging or widening types (e.g. in the constraint solver). Mypy uses unions in a few cases but uses joins in most cases. This leads to many false positives that are well documented. I don't think the mypy maintainers have reached a consensus on this issue yet. There are plans to switch ternary operator evaluation from join to union before mypy 1.0 is released, which could be a first step toward converting the remaining joins to unions.
When performing override matching, mypy takes into account arguments that evaluate to Any, and if it determines that more than one override matches due to an Any arg, it evaluates the return type of the call as Any. Pyright chooses the first match in this case. This is a situation where I think the respective behaviors are justified given the different use cases targeted by pyright and mypy, with pyright being used as the basis for a language server (pylance). There has been some talk of adding support for a "weak union" type that could be a good compromise solution in this case.

I do think it's reasonable to document the differences in behaviors between major type checkers. A while ago, I started to write a document that explains the difference between pyright and mypy behaviors and explains the justification for each of these differences. As you mentioned above, every one of these decisions was made for a well-justified reason. I'll try to find time to make further progress on this document and eventually post it.

matangover commented 1 year ago

Thank you very much for the detailed answer.

It's quite rare for code bases to use more than one, so I don't think this is a widespread problem.

Right, but as a library author, I must take these differences into account, since my users might use any of the available type checkers. That is a key point I forgot to mention earlier so thanks for pointing that out. One example is annotating parameters with Literal - consider the examples I gave here and the API typing debate exacerbated by the inconsistency. Consider also this example from numpy where they had to modify overload ordering to cater for mypy & pyright overload choosing logic. Also in typeshed (https://github.com/python/typeshed/issues/8566).

As a maintainer of pyright, I don't feel very motivated to participate in such an exercise

Yes. I understand.

When "allow-redefinition" is true, mypy's behavior is closer to pyright's

Another weird difference is that mypy only allows redefinitions "within the same block and nesting depth as the original definition", which seems somewhat arbitrary

I'll try to find time to make further progress on this document and eventually post it

That would be super helpful!

erictraut commented 1 year ago

Thanks for the additional context. I agree that it's important for library authors to be able to define interface types in a manner that works across type checkers. I think that's a more attainable goal than the more general goal you articulated above. And I think this is a goal worth pursuing.

You may already be aware of this, but I implemented a feature in pyright (in the form of the --verifytypes command-line option) that allows authors of py.typed libraries to determine whether the types of symbols exported as part of the library's public interface are relying on type inference that might be evaluated differently by different type checkers. Pyright will output a report that enumerates any so-called "ambiguously typed" symbols. You can eliminate the ambiguity by providing explicit type annotations.

Looking at the four sources of type evaluation inconsistency that I listed above, item 2 is already slated to be fixed. If you think this is important to have fixed for the mypy 1.0 release, please voice your opinion.

I think that there's a strong argument to be made for the change in item 3. If you're interested in seeing that, please lend your voice to the discussion in the mypy tracker.

Item 1 is a source of much pain for mypy users. It is also the reason for the literals problem described in the issue you linked. Perhaps mypy maintainers would reconsider this behavior there's sufficient interest expressed by mypy users. Here's the code sequence mentioned in the issue you cited above:

def test(x: Literal["x", "y"]):
    pass

x = "x"
test(x) # error in mypy, ok in pyright

The reason this works fine in pyright is because it infers the type of x based on all values that are assigned to x. In this case, the inferred type of x is str, but pyright also applies assignment-based type narrowing, so the narrowed type immediately after the assignment is Literal['x']. That's perfectly type safe and, as can be seen in this example, eliminates false positives. Mypy treats the statement x = "x" as an implied type declaration, and it does not apply type narrowing for the assignment, so it loses the fact that a literal value was assigned on this code path.

As for item 4, I have an idea for how to eliminate the inconsistency without harming the user experience for pylance users. I've filed this work item. I'll investigate whether this is feasible. No promises, but I think it's worth exploring.

matangover commented 1 year ago

And I think this is a goal worth pursuing

I'm happy to hear that

You may already be aware of this, but I implemented a feature in pyright

Yes, I've been using it and it's very helpful

Mypy treats the statement x = "x" as an implied type declaration, and it does not apply type narrowing for the assignment, so it loses the fact that a literal value was assigned on this code path

Actually, you confused two separate things. This example is not about the assignment vs. declaration difference. It's specifically about Literals. The difference here between mypy and pyright is that mypy never infers Literal (unless it's Final). Otherwise it would infer x as a Literal. To illustrate, even if I separate the declaration from the assignment, the error remains:

def test(x: Literal["x", "y"]):
    pass

x: str
x = "x"
test(x) # error in mypy, ok in pyright

erictraut commented 1 year ago

@matangover, I've posted documentation that captures the major behavior differences between mypy and pyright and the justifications for these design choices. Let me know if you think that I've missed any important points.

matangover commented 1 year ago

The doc is very helpful. I learned a lot from reading it and I don't know of anything you've missed. Thanks for this

python / typing

More consistency in inference rules between type checkers #1315