python / cpython

The Python programming language
https://www.python.org
Other
62.86k stars 30.11k forks source link

cannot create consistent MRO for multiple generic inheritance #106102

Open sanderr opened 1 year ago

sanderr commented 1 year ago

Bug report

When Generic appears twice in a class' inheritance tree, Python may fail to find a consistent MRO. An effort seems to be taken to address this in typing._BaseGenericAlias and typing._GenericAlias. However, when one of the bases is a subscripted generic and the other is a child of a subscripted generic (no longer generic itself) this falls short. I believe the root cause is the fact that _GenericAlias, unlike its parent, only checks for isinstance(b, _BaseGenericAlias and not for issubclass(b, Generic) when considering whether it should skip the Generic base for this MRO entry.

Consider the following example:

from typing import Generic, TypeVar

T = TypeVar("T")

class A(Generic[T]):
    pass

class B(A[int]):
    pass

class Works(B, Generic[T]):
    pass

class WorksToo(Generic[T], A[int]):
    pass

class Broken(Generic[T], B):
    pass

The Works class does not present a problem because Generic[T] appears as the last base. The WorksToo class works because A[int] is recognized by _GenericAlias.__mro_entries__ as another _BaseGenericAlias, therefore Generic is only included as MRO entry for the latter. Broken results in an exception, even though B is semantically equivalent to A[int].

Traceback (most recent call last):
  File "/home/sander/documents/projects/python-sandbox/main.py", line 23, in <module>
    class Broken(Generic[T], B):
TypeError: Cannot create a consistent method resolution
order (MRO) for bases Generic, B

I managed to get it to run correctly by making the following change to typing.py:

diff --git a/usr/lib/python3.9/typing.py b/typing.py
index d35a2a5..06be18f 100644
--- a/usr/lib/python3.9/typing.py
+++ b/typing.py
@@ -809,7 +809,7 @@ class _GenericAlias(_BaseGenericAlias, _root=True):
                 return ()
             i = bases.index(self)
             for b in bases[i+1:]:
-                if isinstance(b, _BaseGenericAlias) and b is not self:
+                if (isinstance(b, _BaseGenericAlias) or issubclass(b, Generic)) and b is not self:
                     return ()
         return (self.__origin__,)

I am not sufficiently familiar with typing's internals (or even MRO) to be completely confident of this patch, but I believe it to be sound. If this issue gets confirmed I'd be willing to open a pull request with this change.

Your environment

Linked PRs

JelleZijlstra commented 1 year ago

Haven't had time to look into this yet, but generally Generic is supposed to be the last base class. Maybe that's something we should document or enforce better.

sanderr commented 1 year ago

That would've been my alternative suggestion, and I do still think it would be a nice improvement and a satisfcatory resolution. But the lines I linked gave me the impression that it should in fact bepossible to use it anywhere.

sanderr commented 1 year ago

@JelleZijlstra have you had any chance to give this some thought? This behavior has been a bit of a nuisance to us: we added a generic base to one of our library's classes, which broke some usages of the form class ClientClass(Generic[T], LibraryClassThatIsNowGeneric). As a result we've had to jump through some hoops to ensure backwards compatibility.

It would be wonderful if this just worked, as with my patch, but failing that it would be a great help if we could expect library clients to put Generic as the last base class and be backed up by the documentation, then we would be free to make any such changes in the future at least.

JelleZijlstra commented 1 year ago

I played with your proposed patch for a bit and couldn't find anything obviously broken by it, but I don't understand that code well enough to be confident the change is correct.

I would prefer documenting that Generic[T] should always be the last base class. The new PEP-695 syntax automatically inserts Generic[T] as the last base.

sanderr commented 1 year ago

Thanks for your input. Do you know of anyone who might be more acquainted with this code? Because I do still feel like my patch would be consistent with what is already there, and therefore the current behavior could be considered a (albeit minor) bug. Then again, I can't claim to fully understand all nuances of this code myself.

If not, I'll see if I can open a small documentation PR with the requirement to put Generic as the last base class, as you suggest.

sobolevn commented 1 year ago

is supposed to be the last base class. Maybe that's something we should document or enforce better.

I've seen lots of real life uses of Generic as the first base class:

Moreover, in some cases different Generic placement will produce different runtime results:

>>> class L1(Generic[T], list[T]): ...
... 
>>> class L2(list[T], Generic[T]): ...
... 

>>> L1.__mro__
(<class '__main__.L1'>, <class 'typing.Generic'>, <class 'list'>, <class 'object'>)
>>> L2.__mro__
(<class '__main__.L2'>, <class 'list'>, <class 'typing.Generic'>, <class 'object'>)

>>> type(L1[int])
<class 'typing._GenericAlias'>
>>> type(L2[int])
<class 'types.GenericAlias'>