python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.22k stars 2.78k forks source link

(🎁) Mypy too conservative with base class compatibility (Definition of in base class is incompatible) #12314

Open KotlinIsland opened 2 years ago

KotlinIsland commented 2 years ago
class L:
    a: int

class R:
    b: int

class Both(L, R):
    pass

class A:
    @property
    def x(self) -> L: ...

class B:
    @property
    def x(self) -> R: ...

class C(A, B):  # error: Definition of "i" in base class "A" is incompatible with definition in base class "B"
    pass

both: L = Both()
assert isinstance(both, R)
reveal_type(both)  # Revealed type is "<subclass of "L" and "R">"
c: A = C()
assert isinstance(c, B)  # error: Subclass of "A" and "B" cannot exist: would have incompatible method signatures
reveal_type(c)  # expected "subclass of A and B", actual "A"
reveal_type(c.x)  # expected "subclass of L and R", actual "L"

L and R are compatible (Both is an implementation), I would expect mypy to work here

JelleZijlstra commented 2 years ago

I don't see the bug here. L and R are different classes. The type system is nominal by default; if you want structural types, use Protocols.

KotlinIsland commented 2 years ago

I don't want structural, I want the nominal type "subclass of L and R"

flisboac commented 2 years ago

Considering that nominal subtyping requires inheritance, and the return values (for the properties) have no base class in common, what you ask is impossible, I think.

The only way I understand that this could even work is if mypy somehow synthesized a class, and somehow considered it as a base for another class, but that does not make much sense because the synthesized class does not exist in the code, and it would also break the nominal typing convention.

I believe return values are naturally covariant, so a better solution would be to type them in terms of a common ancestor, or as unions.

Also, there's no way for "subtype of L and R" to include "Both" as a candidate, because you never refer to it by anywhere at the point you expect it to be considered, neither directly (i.e. at declaration time) nor as a generic (i.e. as a type variable, bound or not; which does not apply, because nothing is generic here). And even if there was a generic somewhere to make proper type substitutions, Python has no concept of "intersection of types" yet (despite having multi-inheritance), like TypeScript does.

What you ask cannot be realized the way typing is specified right now, across all of its related PEPs. But even if it could, I think it's a bad idea to support such an use case. It makes for a very confusing (static) type model.

Perhaps "subtype of either L or R" is more reasonable, but again, in this case your typing should be explicit, otherwise covariance would not be observed and the check would fail.

For example, this works:

class L():
    a: int

class R():
    b: int

class Both(L, R):
    pass

class A:
    @property
    def x(self) -> L | R: ...

class B:
    @property
    def x(self) -> R | L: ...

class C(A, B):
    pass

c: C = C()
reveal_type(c.x)  # note: Revealed type is "Union[__main__.L, __main__.R]"

This also works:

class Either:
    ...

class L(Either):
    a: int

class R(Either):
    b: int

class A:
    @property
    def x(self) -> Either: ...

class B:
    @property
    def x(self) -> Either: ...

class C(A, B):
    pass

c: C = C()
reveal_type(c.x)  # note: Revealed type is "__main__.Either"
KotlinIsland commented 2 years ago

@flisboac

Python has no concept of "intersection of types"

Mypy does support "subclass of X and Y" types to some extent, which I believe would be valid in my op:

class L:
    a: int
class R:
    b: int
class Both(L, R): ...
a: L = Both()
assert isinstance(a, R)
reveal_type(a) # Revealed type is "<subclass of "L" and "R">"

It makes for a very confusing (static) type model

To me unions and intersections are extremely straightforward, the union and the intersection are an opposite pair, where the union is the super type of all members and the intersection is the subtype of all members. I like to think that object is the union of all types, and Never(NoReturn) is the intersection of all types.

  graph TD;
      AB[A or B] --> B;
      AB-->A;
      A-->AaB[A and B];
      B-->AaB;
flisboac commented 2 years ago

@KotlinIsland I think that what you observed is an implementation detail. You needed to force guards to activate it. Type of a, as declared, is just L; you'd need to always guard the value somehow to be able to assign either L or R. The same logic is applied to class variables (static or not).

There's no way for you to denote that a accept a value that is both L and R without subclassing (nominal). Likewise, there should be no way for you to declare or infer the same for class variables, if model consistency is to be observed.

What I'm trying to say is that this is not a bug. What you're asking for is a new feature. Type checking for return types, today, is based on covariance, with no explicit support, at specification level, for intersection types, AFAICT.

For example, I could write this nonsense; mypy would reveal what I want, but it would never work at runtime:

class L:
    a: int
class R:
    b: int
class C:
    c: int
class Both(L, R): ...
a: L = Both()
assert isinstance(a, R)
assert isinstance(a, C)
reveal_type(a) # note: Revealed type is "__main__.<subclass of "L", "R", and "C">"

That's why I'm saying it's just a peculiarity of the implementation.

To me unions and intersections are extremely straightforward

Your understanding on the theory is the same as mine. Considering that we do have multiple inheritance, intersection types make a lot of sense to me. What I tried to convey, however, is that we still don't have A & B, or Intersection[A, B], and what you ask would make mypy non-compliant with the PEPs, I think. Hence the "confusion" argument.

Having said that, mypy could implement this as a new feature, and champion it to the typing group. Narrow down the details, if needed (i.e. avoiding breaking changes). I'm all for it, although I don't see a way for it to happen.

KotlinIsland commented 2 years ago

I could write this nonsense

Doesn't look like nonsense to me, looks very correct and accurate, and it would be very easy to define a class that would reach the end at runtime.

class L:
    a: int
class R:
    b: int
class C:
    c: int
class Both(L, R): ...
class All(L, R, C): ...
a: L = All()
assert isinstance(a, R)
assert isinstance(a, C)
reveal_type(a) # note: Revealed type is "__main__.<subclass of "L", "R", and "C">"

Fair enough to consider it a feature not a bug, but it does seem blatantly wrong that mypy says that the definition is incompatible, when it is actually missing functionality. I've updated the title to indicate that "it's a feature, not a bug!"

The-Compiler commented 2 years ago

I'm seeing a very similar issue involving (IMHO, proper) usage of stdlib code after upgrading to mypy 0.940:

import email.headerregistry

reg = email.headerregistry.HeaderRegistry()
parsed = reg('Content-Disposition', 'inline')
# reveal_type(parsed)
assert isinstance(parsed, email.headerregistry.ContentDispositionHeader), parsed
print(type(parsed))

now leads to:

x.py:5: note: Revealed type is "email.headerregistry.BaseHeader"
x.py:6: error: Subclass of "BaseHeader" and "ContentDispositionHeader" cannot exist: would have incompatible method
signatures  [unreachable]
    assert isinstance(parsed, email.headerregistry.ContentDispositionHeader), parsed
                      ^
x.py:7: error: Statement is unreachable  [unreachable]
    print(type(parsed))
    ^

I don't quite understand what's going on there. Is this the same issue? Or a problem with the typeshed stubs?

flisboac commented 2 years ago

According to the documentation of HeaderRegistry, for the default headers listed there, it generates a new class with the superclass enlisted in the default map for the requested header, and with BaseHeader as the last base class. The new class would be cached, and have the format class NewClass(ContentDispositionHeader, BaseHeader): ..., etc.

Follows the definitions of those classes, from typeshed's source.

Definition of BaseHeader is:

class BaseHeader(str):
    # max_count is actually more of an abstract ClassVar (not defined on the base class, but expected to be defined in subclasses)
    max_count: ClassVar[Literal[1] | None]
    @property
    def name(self) -> str: ...
    @property
    def defects(self) -> tuple[MessageDefect, ...]: ...
    def __new__(cls: type[Self], name: str, value: Any) -> Self: ...
    def init(self, name: str, *, parse_tree: TokenList, defects: Iterable[MessageDefect]) -> None: ...
    def fold(self, *, policy: Policy) -> str: ...

Definition of ParameterizedMIMEHeader (direct superclass of ContentDispositionHeader) in typeshed is:

class ParameterizedMIMEHeader:
    max_count: ClassVar[Literal[1]]
    def init(self, name: str, *, parse_tree: TokenList, defects: Iterable[MessageDefect], params: Mapping[str, Any]) -> None: ...
    @property
    def params(self) -> types.MappingProxyType[str, Any]: ...
    @classmethod
    def parse(cls, value: str, kwds: dict[str, Any]) -> None: ...

My guess is that the method signatures for init in both of them are not equivalent; ParameterizedMIMEHeader.init has an extra mandatory parameter params, that do not exist on BaseHeader. I don't know for sure if that's how things should be, or if something like a @overload is needed here.

So, I guess this problem doesn't relate to this issue, at least not directly. But I may be wrong. (Mypy playground link: https://mypy-play.net/?mypy=latest&python=3.10&gist=b660a7372d373f86bb69881e9c7757bc)

Now, for some remarks:

  1. Mypy's error reporting here is very poor. We don't even get the name of the method with incompatibilities in the base classes.
  2. Indeed, a BaseClass & ContentDispositionHeader, as suggested by OP, would allow for a more correct typing, especially considering that HeaderRegistry's parameters can change how this dynamically created class is created (and thus its format/signatures).
eltoder commented 1 year ago

AFAICT, in the original example mypy is correct, but the error message is misleading. It makes it sound like you can never inherit a class from A and B together, but you can:

class L:
    a: int

class R:
    b: int

class Both(L, R):
    pass

class A:
    @property
    def x(self) -> L:
        return L()

class B:
    @property
    def x(self) -> R:
        return R()

class C(A, B):
    @property
    def x(self) -> Both:
        return Both()

https://mypy-play.net/?mypy=latest&python=3.11&gist=d1e7d7a82de01a26ac8dbd37b584dd8f

If you do, you have to reimplement x to satisfy both interfaces. The one you get from either A or B won't do.