Open gnprice opened 8 years ago
I haven't looked at the code, but why is this not using Python's standard MRO algorithm? In Python 3 that's always used and in Python 2 it's used for new-style classes (those deriving from object
).
I think of the MRO algorithm as providing an answer to the question "which of my ancestors do I get method foo
from", when several ancestors provide a method with the same name. Which it does by providing a total ordering of the ancestors compatible with the partial ordering of the inheritance hierarchy.
Does the MRO algorithm also provide an answer to the question "which path of intermediate ancestors do I get to ancestor Foo
through"? It's not obvious to me how it does that, though maybe I'd just need to go back and reread how it works and think more.
Ah, it doesn't directly answer that -- in your diamond example D's MRO is [D, B, C, A] but the paths from D to A are D-B-A and D-C-A; those are both sub-sequences of the MRO (as all valid paths should be), but not every sub-sequence is such a path.
As I said, I didn't look at the code -- what is this path (or list of paths) used for?
This is my favorite (assuming we tweak this to also consider Any
base classes):
allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived
However, initially just not doing anything about it is probably okay, as this is not very likely to cause much trouble.
Can someone explain this to me? What is the need for this rule, and what does it even mean -- what kind of "instances" are we talking about here?
The writeup uses internal mypy terminology which makes it difficult to follow.
Here is an example about where this would make a difference:
class A(Generic[T]): pass
class B(A[int]): pass
class C(A[str]): pass
class D(B, C):
# This would be a subtype of both A[int] and A[str]!
pass
The rule I quoted would disallow inheritance hierarchies such as the above because the different ways of getting from D to A would result in different base types (A[int]
and A[str]
).
This would be okay, though:
class A(Generic[T]): pass
class B(A[int]): pass
class C(A[int]): pass
class D(B, C):
pass
Ah, I see. OTOH inheriting from two different parameterized generic classes should still work right?
class A(Generic[T]): pass
class B(Generic[T]): pass
class C(A[int], B[str]): pass
Also, I'm still unsure why the code needs to look for paths rather than check the MRO for some condition (like occurrences of the same generic class -- but not Generic itself -- parameterized differently).
Yeah, your example would be no problem because A
and B
are not related.
Paths are needed because you can have some really weird, complex inheritance hierarchies. Here is an example:
class A(Generic[T]): pass
class B(A[int]): pass
class C(A[T]): pass
class CC(C[List[T]]): pass
class D(B, CC[int]):
# This would be a subtype of both A[int] and A[List[int]]!
pass
Now we'd look at the paths D -> CC -> C -> A and D -> B -> A to determine if things are compatible.
Actually Python itself (actually typing.py in the stdlib) doesn't currently like that definition of CC: I get
TypeError: Cannot inherit from a generic class parameterized with
non-type-variable typing.List[~T]
Strangely, mypy gives an error on C:
/Users/guido/mypy_tests/mro.py:5: error: Invalid type "mro.T"
The source:
from typing import TypeVar, Generic, List
T = TypeVar('T')
class A(Generic[T]): pass
## class B(A[int]): pass
class C(A[T]): pass
This is an area where mypy, PEP 484 and typing.py all seem to have different rules (and the PEP is pretty vague).
Mypy wants you to do it like this (because of #302):
class A(Generic[T]): pass
class B(A[int]): pass
class C(A[T], Generic[T]): pass
class CC(C[List[T]], Generic[T]): pass
class D(B, CC[int]):
# This would be a subtype of both A[int] and A[List[int]]!
pass
The code is still reasonable from a type checking point of view, other than having the incoherent generic base class problem. Maybe typing.py should not reject it?
Yeah, there's something fishy with typing.py here. :-( I think there's even an issue about it in https://github.com/ambv/typehinting/issues but I've lost track of which one it is.
This is my favorite (assuming we tweak this to also consider Any base classes):
allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived
However, initially just not doing anything about it is probably okay, as this is not very likely to cause much trouble.
Yeah, I think this one is probably the right semantics in the end. It's just a little complicated to implement. Either this or either of the simpler options mean that the maptype
code really can just take one path and not worry about the possibility of several, which will also allow simplifying that code.
I'd rather do a little more than make no change to the code -- I'd like to bring the comments in line with the code's behavior one way or another. The simple way would be to do, say, my second option for now:
allow diamond inheritance only if the ancestor is not generic
enforcing it somewhere like verify_base_classes
, and then adjust the maptype
comments to describe the actual situation which is that all paths are equivalent. (This might be what you meant already.) Sound good?
Hmm. I implemented that simple second option:
allow diamond inheritance only if the ancestor is not generic
Pushed it as https://github.com/JukkaL/mypy/compare/master...gnprice:diamond for reference.
But it turns out we actually already use diamond inheritance with generics! The unit tests tell me this -- I produce errors in typeshed/builtins/3/builtins.pyi
on bytearray
, list
, and range
, which diamond-inherit from Sequence[int]
, Reversible[_T]
, and Reversible[int]
respectively.
The latter two seem to be down to a mismatch with the typing
module -- we have list
and range
derive directly from Reversible
as well as Sequence
, but the typing.pyi
in typeshed/stdlib/3/
has Sequence
itself derive from Reversible[_T_co]
, which makes that redundant. Our own two typing.py
files in lib-typing/{2.7,3.2}/
do no such thing, and probably a previous version of typing.pyi
didn't either. builtins.pyi
should just be updated to reflect that.
The other one is more real, though:
class bytearray(MutableSequence[int], ByteString):
while versions of both typing.py
and typing.pyi
consistently have both MutableSequence
and ByteString
derive from Sequence
.
So maybe we actually do need option 3:
allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived
Or can we do without some of this hierarchy over bytearray
?
(Fixing that mismatch in https://github.com/python/typeshed/pull/32 , but that still leaves bytearray
.)
In
mypy/maptype.py
, we have a functionclass_derivation_paths
that finds all the paths up the hierarchy from a derived class to a given ancestor class. The code implicitly assumes that this function will only ever return a single path, for example by silently taking the first element of a list derived from its result and dropping the rest:And indeed there are a couple of assertions in comments in the code that this function should only ever find a single path:
But this doesn't seem to be true! In fact, if we set up the classic diamond inheritance pattern:
then we do in fact get two paths, one through
B
and one throughC
. Demonstrated by adding an assert and a triggering test case here: https://github.com/gnprice/mypy/commit/a85876521 which produces this exception:What should our approach to this situation be?
In roughly increasing order of semantic and implementation complexity, some options include
I'm inclined to start with option 1, as it's super simple to understand and to implement, and see if that ever poses a problem. Maybe nobody actually uses diamond inheritance in code they want to try to type; it's a pretty tricky pattern in the first place. If we want to do this, we should detect the situation up front (like in
verify_base_classes
in semanal.py) and give an appropriate error.Option 2 is also super simple to implement, if people use diamond inheritance but they don't need generics with it. Somewhat less satisfying as a rule for the user.
I'd be kind of concerned if we find ourselves starting to think we need one of the more complex options.