python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.2k stars 2.78k forks source link

Figure out story about diamond inheritance #1065

Open gnprice opened 8 years ago

gnprice commented 8 years ago

In mypy/maptype.py, we have a function class_derivation_paths that finds all the paths up the hierarchy from a derived class to a given ancestor class. The code implicitly assumes that this function will only ever return a single path, for example by silently taking the first element of a list derived from its result and dropping the rest:

    return map_instance_to_supertypes(instance, superclass)[0]

And indeed there are a couple of assertions in comments in the code that this function should only ever find a single path:

    # FIX: Currently we should only have one supertype per interface, so no
    #      need to return an array
[...]
    # FIX: Currently we might only ever have a single path, so this could be
    #      simplified

But this doesn't seem to be true! In fact, if we set up the classic diamond inheritance pattern:

  A
 / \
B   C
 \ /
  D

then we do in fact get two paths, one through B and one through C. Demonstrated by adding an assert and a triggering test case here: https://github.com/gnprice/mypy/commit/a85876521 which produces this exception:

AssertionError: [[<TypeInfo __main__.B>, <TypeInfo __main__.A>], [<TypeInfo __main__.C>, <TypeInfo __main__.A>]]

What should our approach to this situation be?

In roughly increasing order of semantic and implementation complexity, some options include

I'm inclined to start with option 1, as it's super simple to understand and to implement, and see if that ever poses a problem. Maybe nobody actually uses diamond inheritance in code they want to try to type; it's a pretty tricky pattern in the first place. If we want to do this, we should detect the situation up front (like in verify_base_classes in semanal.py) and give an appropriate error.

Option 2 is also super simple to implement, if people use diamond inheritance but they don't need generics with it. Somewhat less satisfying as a rule for the user.

I'd be kind of concerned if we find ourselves starting to think we need one of the more complex options.

gvanrossum commented 8 years ago

I haven't looked at the code, but why is this not using Python's standard MRO algorithm? In Python 3 that's always used and in Python 2 it's used for new-style classes (those deriving from object).

gnprice commented 8 years ago

I think of the MRO algorithm as providing an answer to the question "which of my ancestors do I get method foo from", when several ancestors provide a method with the same name. Which it does by providing a total ordering of the ancestors compatible with the partial ordering of the inheritance hierarchy.

Does the MRO algorithm also provide an answer to the question "which path of intermediate ancestors do I get to ancestor Foo through"? It's not obvious to me how it does that, though maybe I'd just need to go back and reread how it works and think more.

gvanrossum commented 8 years ago

Ah, it doesn't directly answer that -- in your diamond example D's MRO is [D, B, C, A] but the paths from D to A are D-B-A and D-C-A; those are both sub-sequences of the MRO (as all valid paths should be), but not every sub-sequence is such a path.

As I said, I didn't look at the code -- what is this path (or list of paths) used for?

JukkaL commented 8 years ago

This is my favorite (assuming we tweak this to also consider Any base classes):

allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived

However, initially just not doing anything about it is probably okay, as this is not very likely to cause much trouble.

gvanrossum commented 8 years ago

Can someone explain this to me? What is the need for this rule, and what does it even mean -- what kind of "instances" are we talking about here?

JukkaL commented 8 years ago

The writeup uses internal mypy terminology which makes it difficult to follow.

Here is an example about where this would make a difference:

class A(Generic[T]): pass
class B(A[int]): pass
class C(A[str]): pass
class D(B, C): 
    # This would be a subtype of both A[int] and A[str]!
    pass

The rule I quoted would disallow inheritance hierarchies such as the above because the different ways of getting from D to A would result in different base types (A[int] and A[str]).

This would be okay, though:

class A(Generic[T]): pass
class B(A[int]): pass
class C(A[int]): pass
class D(B, C): 
    pass
gvanrossum commented 8 years ago

Ah, I see. OTOH inheriting from two different parameterized generic classes should still work right?

class A(Generic[T]): pass
class B(Generic[T]): pass
class C(A[int], B[str]): pass

Also, I'm still unsure why the code needs to look for paths rather than check the MRO for some condition (like occurrences of the same generic class -- but not Generic itself -- parameterized differently).

JukkaL commented 8 years ago

Yeah, your example would be no problem because A and B are not related.

Paths are needed because you can have some really weird, complex inheritance hierarchies. Here is an example:

class A(Generic[T]): pass
class B(A[int]): pass
class C(A[T]): pass
class CC(C[List[T]]): pass
class D(B, CC[int]): 
    # This would be a subtype of both A[int] and A[List[int]]!
    pass

Now we'd look at the paths D -> CC -> C -> A and D -> B -> A to determine if things are compatible.

gvanrossum commented 8 years ago

Actually Python itself (actually typing.py in the stdlib) doesn't currently like that definition of CC: I get

TypeError: Cannot inherit from a generic class parameterized with
non-type-variable typing.List[~T]

Strangely, mypy gives an error on C:

/Users/guido/mypy_tests/mro.py:5: error: Invalid type "mro.T"

The source:

from typing import TypeVar, Generic, List
T = TypeVar('T')
class A(Generic[T]): pass
## class B(A[int]): pass
class C(A[T]): pass

This is an area where mypy, PEP 484 and typing.py all seem to have different rules (and the PEP is pretty vague).

JukkaL commented 8 years ago

Mypy wants you to do it like this (because of #302):

class A(Generic[T]): pass
class B(A[int]): pass
class C(A[T], Generic[T]): pass
class CC(C[List[T]], Generic[T]): pass
class D(B, CC[int]): 
    # This would be a subtype of both A[int] and A[List[int]]!
    pass

The code is still reasonable from a type checking point of view, other than having the incoherent generic base class problem. Maybe typing.py should not reject it?

gvanrossum commented 8 years ago

Yeah, there's something fishy with typing.py here. :-( I think there's even an issue about it in https://github.com/ambv/typehinting/issues but I've lost track of which one it is.

gnprice commented 8 years ago

This is my favorite (assuming we tweak this to also consider Any base classes):

allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived

However, initially just not doing anything about it is probably okay, as this is not very likely to cause much trouble.

Yeah, I think this one is probably the right semantics in the end. It's just a little complicated to implement. Either this or either of the simpler options mean that the maptype code really can just take one path and not worry about the possibility of several, which will also allow simplifying that code.

I'd rather do a little more than make no change to the code -- I'd like to bring the comments in line with the code's behavior one way or another. The simple way would be to do, say, my second option for now:

allow diamond inheritance only if the ancestor is not generic

enforcing it somewhere like verify_base_classes, and then adjust the maptype comments to describe the actual situation which is that all paths are equivalent. (This might be what you meant already.) Sound good?

gnprice commented 8 years ago

Hmm. I implemented that simple second option:

allow diamond inheritance only if the ancestor is not generic

Pushed it as https://github.com/JukkaL/mypy/compare/master...gnprice:diamond for reference.

But it turns out we actually already use diamond inheritance with generics! The unit tests tell me this -- I produce errors in typeshed/builtins/3/builtins.pyi on bytearray, list, and range, which diamond-inherit from Sequence[int], Reversible[_T], and Reversible[int] respectively.

The latter two seem to be down to a mismatch with the typing module -- we have list and range derive directly from Reversible as well as Sequence, but the typing.pyi in typeshed/stdlib/3/ has Sequence itself derive from Reversible[_T_co], which makes that redundant. Our own two typing.py files in lib-typing/{2.7,3.2}/ do no such thing, and probably a previous version of typing.pyi didn't either. builtins.pyi should just be updated to reflect that.

The other one is more real, though:

class bytearray(MutableSequence[int], ByteString):

while versions of both typing.py and typing.pyi consistently have both MutableSequence and ByteString derive from Sequence.

So maybe we actually do need option 3:

allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived

Or can we do without some of this hierarchy over bytearray?

gnprice commented 8 years ago

(Fixing that mismatch in https://github.com/python/typeshed/pull/32 , but that still leaves bytearray.)