Closed danpascu closed 1 year ago
This is due to the behavior of inspect.signature()
. To be more specific, _signature_from_callable()
in inspect.py
.
The current behavior to get a signature for a type
:
__new__
__init__
__new__
in __mro__
__init__
in __mro__
As A1
has a direct defined __init__
, it is shown as the signature. For A2
, no __init__
or __new__
is defined, but there's a __new__
method defined in its __mro__
, so that method was used.
This feels a bit weird to me - like inheritance, the derived class should take the bahavior of closer bases, not further ones. If we can agree this is a bug, I can fix it by going through __mro__
and search for user-defined __new__
and __init__
. This would break the old behavior though(obviously).
We can also make it safer - only change this in main and not backport. I guess it's a decision call.
@AlexWaygood
From the description this seems to happen because it favors __new__
over __init__
when looking for the signature.
In my experience it's usually __init__
that has the more specific signature, while __new__
has a generic signature, only because it has to accept whatever __init__
accepts, but it doesn't usually care about the actual arguments when building the instance, while __init__
cares about the arguments because it has to use them to initialize the instance.
Wouldn't it make more practical sense to prefer __init__
over __new__
, given that it usually has the more specific signature? As I understand the search algorithm mentioned above, it means any class that defines both a generic __new__
and a specific __init__
will still have the same problem, even after the proposed fix and even in the absence of any inheritance.
From the description this seems to happen because it favors
__new__
over__init__
when looking for the signature.In my experience it's usually
__init__
that has the more specific signature, while__new__
has a generic signature, only because it has to accept whatever__init__
accepts, but it doesn't usually care about the actual arguments when building the instance, while__init__
cares about the arguments because it has to use them to initialize the instance.Wouldn't it make more practical sense to prefer
__init__
over__new__
, given that it usually has the more specific signature? As I understand the search algorithm mentioned above, it means any class that defines both a generic__new__
and a specific__init__
will still have the same problem, even after the proposed fix and even in the absence of any inheritance.
What do you mean by "generic" __new__
? You should not define a __new__
method for a class unless you need to do something very specific when creating an instance. The __new__
method in your example code should not be defined(I took it as a proof of concept). For most use cases, __new__
is not defined and __init__
is the only method defined. For the cases where __new__
is defined, it should be critical(for example, singleton class) and should take precedence before __init__
.
The __new__
method in my example was there as part of a minimal test case that shows the problem.
Consider this example for a "generic" __new__
that has a purpose:
class XMLSimpleElement:
_xml_value_type = None # To be defined in subclass
def __new__(cls, *args, **kw):
if cls._xml_value_type is None:
raise TypeError(f"The {cls.__name__} class cannot be instantiated because it doesn't define the _xml_value_type attribute")
return super().__new__(cls)
def __init__(self, value):
super().__init__()
self.value = value
...
class XMLStringElement(XMLSimpleElement):
_xml_value_type = str
In this case the __new__
method is meant to prevent instantiation for classes that do not define their value type, since they are a kind of an abstract base class that cannot work without knowing the value type, but at the same time it doesn't care about the arguments it receives, nor does it want to match its signature with that of __init__
since a subclass may have an __init__
with a slightly different signature, like:
class XMLSimpleElementWithAttributes(XMLSimpleElement):
def __init__(self, value, *attributes):
super().__init__(value)
self.attributes = attributes
Similarly there are many examples out there of base classes that have a test in their __new__
method that if the class being instantiated is the base class itself, it will raise a TypeError because the base class is not supposed to be instantiated.
In the example above both help(XMLSimpleElement)
and help(XMLStringElement)
show the wrong signature, while help(XMLSimpleElementWithAttributes)
shows the correct signature.
I don't think the example code is the correct way to go.
Your XMLSimpleElement
is an abstract class. It feels a bit weird to have an __init__
method on an abstract base class that defines a specific way(that can be overwritten) to initialize.
I think the correct way to do it is to have a pure abstract base class
class XMLBaseElement:
_xml_value_type = None # To be defined in subclass
def __new__(cls, *args, **kw):
if cls._xml_value_type is None:
raise TypeError(f"The {cls.__name__} class cannot be instantiated because it doesn't define the _xml_value_type attribute")
return super().__new__(cls)
Then a simple based on that:
class XMLSimpleElement:
def __init__(self, value):
super().__init__()
self.value = value
Having a __new__
method that is very generic and an __init__
that is very specific just do not add together to me. Maybe others have different opinions.
If the implementation is like this, then all the signatures would be correct with my proposal.
Again, it was just an example to show that a generic __new__
and a specific __init__
can coexist. I'm sure it can be refactored to work around the problem. My point is that I'd like something that works without me having to work around the problem.
In the ideal case, the signature lookup should traverse the __mro__
looking for the first method between __new__
and __init__
that has the most specific signature. However it's unclear to me if we can properly define what "the most specific" signature is and how to identify it, let alone the fact that this would probably add a great deal of complexity to the solution.
Anyway, my follow up comments were more about stating my opinion that from my experience preferring __init__
over __new__
when picking the signature seems to work better in more cases and I find the current choice of preferring __new__
over __init__
less practical and more likely to run into issues.
That being said your proposal would definitely improve things and I'd be thankful for it. As for the cases that it would not cover, I guess I can always define __signature__
at the class level to override the choice for signature if I'm not happy with it.
I guess my point is - you can't prove a point with problematic code. Not saying your code is wrong, but if it's not preferable, then changing existing behavior based on that would be a bad idea. That's being said - do you know any real code in a popular repo that has similar issues?
In the docs it mentioned that:
__new__()
is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation. It is also commonly overridden in custom metaclasses in order to customize class creation.
I think if you can do something in __init__
, you should do it in __init__
, not __new__
. In your example, you can check the class member in __init__
and it works perfectly fine - so no __new__
is needed. You'll probably say that's a "workaround", but that's probably - probably the correct way to achieve the feature.
I guess one of the reason to favor __new__
over __init__
is that - __new__
is guaranteed to execute, but __init__
is not. I can easily create some meaningful code to create instances of different classes based on the argument then the __init__
method would be from the class of these instances.
class XMLStrElement:
def __init__(self, val):
self.val = val
class XMLIntElement:
def __init__(self, val):
self.val = val
class XMLMultiElement:
def __init__(self, *args):
self.vals = args
class XMLElement:
def __new__(cls, *args):
if len(args) == 1:
if isinstance(args[0], str):
return XMLStrElement(args[0])
elif isinstance(args[0], int):
return XMLIntElement(args[0])
return super().__new__(cls)
else:
return XMLMultiElement(args)
def __init__(self, val):
self.val = val
i = XMLElement(1)
s = XMLElement("s")
m = XMLElement(1, "s")
e = XMLElement(None)
print(i, s, m, e)
Instead of personal experience, __new__
is favored over __init__
on a language level.
I think overall, I feel weird about having a generic __new__
and a specific __init__
on the same class - it does not quite make sense to me.
Wouldn't it make more practical sense to prefer
__init__
over__new__
, given that it usually has the more specific signature?
No. __new__
can change the object type. You even cannot tell the object type until __new__
is executed, so you cannot decide which __init__
to call.
I do think the original request is valid - if a derived class D
defined __init__
method but not __new__
, we should use that for signature. And for all the derived classes based on D
too.
The existing behavior feels wrong to me - at least we should not have different behavior on class D
and D1(D)
.
I do think the original request is valid - if a derived class
D
defined__init__
method but not__new__
, we should use that for signature. And for all the derived classes based onD
too.The existing behavior feels wrong to me - at least we should not have different behavior on class
D
andD1(D)
.
I cannot follow. What is D
? Can you provide a code snippet?
I do think the original request is valid - if a derived class
D
defined__init__
method but not__new__
, we should use that for signature. And for all the derived classes based onD
too. The existing behavior feels wrong to me - at least we should not have different behavior on classD
andD1(D)
.I cannot follow. What is
D
? Can you provide a code snippet?
It's the original code snippet.
class A0:
def __new__(cls, *args, **kw):
return super().__new__(cls)
def __init__(self, *args, **kw):
super().__init__()
class A1(A0):
def __init__(self, a, b):
super().__init__()
self.a = a
self.b = b
class A2(A1):
c = None
It does not make sense that A2
and A1
have different signatures. I think the signature for both A1
and A2
should be (a, b)
. Maybe someone would argue that they should both be (*args, **kw)
- we can discuss that. However, I don't think in any case they should be different, that's just wrong.
I think the signature for both
A1
andA2
should be(a, b)
.
Not exactly.
With customized metaclass, there are use cases like
class A0:
def __new__(cls, *args, **kw):
class D:
def __init__(self):
super().__init__()
return D()
class A1(A0):
def __init__(self, a, b):
super().__init__()
self.a = a
self.b = b
class A2(A1):
c = None
A2(...)
is an instance of D
. (You can call with any args/kwargs).
they should both be (*args, **kw)
I think so.
I think the signature for both
A1
andA2
should be(a, b)
.Not exactly.
With customized metaclass, there are use cases like
class A0: def __new__(cls, *args, **kw): class D: def __init__(self): super().__init__() return D() class A1(A0): def __init__(self, a, b): super().__init__() self.a = a self.b = b class A2(A1): c = None
A2(...)
is an instance ofD
. (You can call with any args/kwargs).they should both be (*args, **kw)
I think so.
The code does not make sense - yes it's theoretically possible, but it's invalid code.
Could you find any real code that has a similar code structure?
The existing behavior of inspect.signature()
does not do what you wanted either so this is a bug anyway.
I guess my point is - you can't prove a point with problematic code. Not saying your code is wrong, but if it's not preferable, then changing existing behavior based on that would be a bad idea.
I'm not sure why we ended up dissecting the test code and analyzing it as if it's some real code. All of it was typed on the spot as a minimal test case to highlight the problem. Saying it's problematic code just sidesteps the issue.
We can discuss it at the concept level if you prefer. I find it problematic that a class that defines both __new__
and __init__
where they have different signatures (in the sense that one is generic and the other specific) may have the help page show the wrong signature. I don't claim to know what the best solution for this is, or even if there is a good solution to identifying the proper signature.
Now you say that is not "preferred" code to have a generic __new__
and a specific __init__
on the same class. I don't necessarily agree with that. I think it depends on a case by case basis what the proper solution is. The reason I called your approach of adding a new XMLBaseElement class into the mix a workaround is because it didn't came from a design perspective need, it came from the need to navigate around the problem that is caused by how the signature lookup works. I do not think artificial class proliferation like this is a good thing unless that base class has a reason to exist, other that avoiding a signature lookup problem.
That's being said - do you know any real code in a popular repo that has similar issues?
A quick search in the base library shows this example in unittest/mock:
class NonCallableMock(Base):
def __new__(cls, /, *args, **kw):
...
def __init__(
self, spec=None, wraps=None, name=None, spec_set=None,
parent=None, _spec_state=None, _new_name='', _new_parent=None,
_spec_as_instance=False, _eat_self=None, unsafe=False, **kwargs
):
...
which will show the wrong signature in help()
even after the proposed fix.
I think if you can do something in
__init__
, you should do it in__init__
, not__new__
. In your example, you can check the class member in__init__
and it works perfectly fine - so no__new__
is needed. You'll probably say that's a "workaround", but that's probably - probably the correct way to achieve the feature.
I disagree. The reason to put it in __new__
is because instance creation can be expensive (both memory wise and time wise). I prefer to decide early and avoid the expenses of creating the instance just to throw it away later in __init__
. In this case I can confidently say that doing the test in __init__
is definitely not the correct way to achieve the feature.
I guess one of the reason to favor
__new__
over__init__
is that -__new__
is guaranteed to execute, but__init__
is not.
That is a good point that I forgot about. But I would argue that it happens in cases like unpickling or returning instances of other types from __new__
, which are not relevant to the issue. Let's not forget, this is about help()
showing the wrong signature. When I use help()
on a class is because I want to learn how to use it. It doesn't matter to me that __init__
is not called when unpickling.
I can easily create some meaningful code to create instances of different classes based on the argument then the
__init__
method would be from the class of these instances.class XMLStrElement: def __init__(self, val): self.val = val class XMLIntElement: def __init__(self, val): self.val = val class XMLMultiElement: def __init__(self, *args): self.vals = args class XMLElement: def __new__(cls, *args): if len(args) == 1: if isinstance(args[0], str): return XMLStrElement(args[0]) elif isinstance(args[0], int): return XMLIntElement(args[0]) return super().__new__(cls) else: return XMLMultiElement(args) def __init__(self, val): self.val = val i = XMLElement(1) s = XMLElement("s") m = XMLElement(1, "s") e = XMLElement(None) print(i, s, m, e)
There are many things that I consider wrong with this example, but what help(XMLElement)
shows for the signature is not one of them.
XMLElement
serves no purpose as a class. You can replace that with a function which will make it less confusing__new__
is not something I consider best practice.__init__
method on XMLElement
has no reason to exist. Is never called and it just adds confusion.help(XMLElement)
is the one for __new__
which is correct in this case, as that the only thing that will get called.I think overall, I feel weird about having a generic
__new__
and a specific__init__
on the same class - it does not quite make sense to me.
It doesn't matter how you feel about it. It is allowed by the language and other people will use it in cases similar to what my previous examples showed and then help()
will not help them.
While one can argue that with help()
you can read a bit further down and see the signatures for both __new__
and __init__
and draw a conclusion about how to call it, a lot of people use this indirectly in ipython by typing a question mark after the class name to get it's signature and ipython will say:
Init signature: Foo(*args, **kw)
which is both wrong (since it's actually the __new__
signature) and unhelpful
It does not make sense that
A2
andA1
have different signatures. I think the signature for bothA1
andA2
should be(a, b)
.
Agree.
Maybe someone would argue that they should both be
(*args, **kw)
- we can discuss that.
I hope not. That would be worse than now. A very strong -1 from me on this idea.
A quick search in the base library shows this example in unittest/mock:
class NonCallableMock(Base): def __new__(cls, /, *args, **kw): ... def __init__( self, spec=None, wraps=None, name=None, spec_set=None, parent=None, _spec_state=None, _new_name='', _new_parent=None, _spec_as_instance=False, _eat_self=None, unsafe=False, **kwargs ): ...
which will show the wrong signature in
help()
even after the proposed fix.
This is a very interesting example because that was explicitly fixed in https://github.com/python/cpython/pull/100252. Should we take it as a hint that this is wrong(or at least not preferred)?
That is a good point that I forgot about. But I would argue that it happens in cases like unpickling or returning instances of other types from
__new__
, which are not relevant to the issue. Let's not forget, this is abouthelp()
showing the wrong signature. When I usehelp()
on a class is because I want to learn how to use it. It doesn't matter to me that__init__
is not called when unpickling.It doesn't matter how you feel about it. It is allowed by the language and other people will use it in cases similar to what my previous examples showed and then
help()
will not help them.While one can argue that with
help()
you can read a bit further down and see the signatures for both__new__
and__init__
and draw a conclusion about how to call it, a lot of people use this indirectly in ipython by typing a question mark after the class name to get it's signature and ipython will say:
Init signature: Foo(*args, **kw)
which is both wrong (since it's actually the
__new__
signature) and unhelpful
The reason we discussed __new__
vs __init__
is to determine whether it would better to prefer __init__
over __new__
. Yes, the user can write many valid code, the issue is what is the correct signature? Obviously even now @sunmy2019 has a very different opinion than you. So at least it's not "obvious" or "absolute".
The reason a case where __init__
not executing is mentioned is because in that case, showing __init__
signature on help()
is incorrect.
At this point, we can't even confidently say that "specific" should be prioritized before "generic" - it is true for some cases, but not always.
That's why I'm asking for real examples - we'll have a better idea about how people are actually writing code and try to provide a better(not always "correct") signature for the classes.
A quick search in the base library shows this example in unittest/mock:
class NonCallableMock(Base): def __new__(cls, /, *args, **kw): ... def __init__( self, spec=None, wraps=None, name=None, spec_set=None, parent=None, _spec_state=None, _new_name='', _new_parent=None, _spec_as_instance=False, _eat_self=None, unsafe=False, **kwargs ): ...
which will show the wrong signature in
help()
even after the proposed fix.This is a very interesting example because that was explicitly fixed in #100252. Should we take it as a hint that this is wrong(or at least not preferred)?
I don't see why we should make that assumption. The commit title is "3.8x speed improvement in (Async)Mock instantiation". Maybe the change is related to that. Or maybe they noticed that the signature in help()
was wrong and instead of reporting it as a bug they decided to fix it by syncing the signatures. Or maybe some user reported to them that the introspection of the class does return the wrong signature and they decided to fix it. We can speculate endlessly, you just conveniently picked an explanation to favor your point of view that that code is bad. Personally I'd go with the explanation that is part of the 3.8x speed improvement. Otherwise they would have probably made a different patch like "fixed signature" of "refactored code" if it was not part of the scope of the speed improvement patch.
I don't see why we should make that assumption. The commit title is "3.8x speed improvement in (Async)Mock instantiation". Maybe the change is related to that. Or maybe they noticed that the signature in
help()
was wrong and instead of reporting it as a bug they decided to fix it by syncing the signatures. Or maybe some user reported to them that the introspection of the class does return the wrong signature and they decided to fix it. We can speculate endlessly, you just conveniently picked an explanation to favor your point of view that that code is bad. Personally I'd go with the explanation that is part of the 3.8x speed improvement. Otherwise they would have probably made a different patch like "fixed signature" of "refactored code" if it was not part of the scope of the speed improvement patch.
Of course it's possible(or likely) that the fix is for the speed up - I'm simply stating that having an inconsistent signature for __new__
and __init__
is not preferred. Could you find any similar usage in the current main
branch?
I'm not picking on you, because this is not my decision. You want favoring __init__
over __new__
- this is a breaking behavior change, and you need very good reason to do that. Having consistent signature for derived classes(what I proposed earlier) could be considered as a bug fix and it's easier to get core devs to approve.
A good evidence to convince core devs to change the current behavior is a couple of real code examples (it would be great that they are in CPython repo, but other popupar repos work too).
I'm pretty confident that my proposal earlier can get through - that's an enhancement to existing behavior (or a bug fix). If you want something more, you probably need more. For now, it seems like this is not a super interesting topic to core devs - we can try to ask someone for their opinion, but it would be nice if we prepared it well.
And of course, there would be no guarantee that this will be fixed (or addressed) because CPython is still basically driven by volunteers :)
I don't see why we should make that assumption. The commit title is "3.8x speed improvement in (Async)Mock instantiation". Maybe the change is related to that. Or maybe they noticed that the signature in
help()
was wrong and instead of reporting it as a bug they decided to fix it by syncing the signatures. Or maybe some user reported to them that the introspection of the class does return the wrong signature and they decided to fix it. We can speculate endlessly, you just conveniently picked an explanation to favor your point of view that that code is bad. Personally I'd go with the explanation that is part of the 3.8x speed improvement. Otherwise they would have probably made a different patch like "fixed signature" of "refactored code" if it was not part of the scope of the speed improvement patch.Of course it's possible(or likely) that the fix is for the speed up - I'm simply stating that having an inconsistent signature for
__new__
and__init__
is not preferred.
We'll have to agree to disagree on this. As far as I'm concerned I see no issue with a generic __new__
coexisting with a specific __init__
on the same class, or the other way around (even though I find it less useful to have a specific __new__
and a generic __init__
- what is even the purpose of that __init__
if it doesn't care about its arguments?)
From my point of view, if a method doesn't use some arguments, there is no point in specifying them in the signature, just to make the signature match with the other method. In fact many IDEs will flag such a usage with a warning that the arguments specified in the signature are not used in the body of the function, yet they do not flag a signature like (self, *args, **kw)
because it can be declared like that in order to work well with super()
or subclasses. So in the earlier example where __new__
just checks if the subclass has defined some class attributes or it checks if you are not trying to instantiate some abstract base class, __new__
doesn't really need any of those arguments that __init__
will need to initialize the instance, so __new__(cls, *args, **kw)
is fine and it clearly indicates that you do not care about those arguments, yet you need to accept whatever __init__
accepts.
Anyway, I don't think we'll convince one another to change our view on this so let's leave it at that.
Could you find any similar usage in the current
main
branch?
I found a couple more but they seemed irrelevant to our discussion:
PurePath
in pathlib.py has mismatching signatures (both generic). This seems to be because **kwargs
seems deprecated:
class PurePath:
def __new__(cls, *args, **kwargs):
...
def __init__(self, *args):
...
_Call
in unittest/mock.py has slightly mismatched signatures:
class _Call(tuple):
def __new__(cls, value=(), name='', parent=None, two=False,
from_kall=True):
def __init__(self, value=(), name=None, parent=None, two=False,
from_kall=True):
Enum
in enum.py has a specific __new__
and a generic __init__
that does nothing. Not sure why __init__
is defined there like that, maybe it has something to do with what the metaclass does behind the scenes. But Enum
also defines __signature__
so it basically overrides anything __new__
or __init__
provide:
class Enum(metaclass=EnumType):
@classmethod
def __signature__(cls):
if cls._member_names_:
return '(*values)'
else:
return '(new_class_name, /, names, *, module=None, qualname=None, type=None, start=1, boundary=None)'
def __new__(cls, value):
...
def __init__(self, *args, **kwds):
pass
Class Message
in importlib/metadata/_adapters.py has a specific __new__
and a generic __init__
. This looks like a case where __new__
should just be integrated in __init__
and removed:
class Message(email.message.Message):
def __new__(cls, orig: email.message.Message):
res = super().__new__(cls)
vars(res).update(vars(orig))
return res
def __init__(self, *args, **kwargs):
self._headers = self._repair_headers()
I'm not picking on you, because this is not my decision.
I do not feel picked on. I just understand we have different points of view.
You want favoring
__init__
over__new__
- this is a breaking behavior change, and you need very good reason to do that.
I do not "want" it that way, I just stated that based on my experience that seems to work better, but I'm open to arguments why the other way around is preferable, though so far I have not seen anything to make me change my mind. That being said, I also understand that this would require a backward incompatible change and that is a very big can of worms that probably none wants to touch, so and it's very unlikely to happen.
Having consistent signature for derived classes(what I proposed earlier) could be considered as a bug fix and it's easier to get core devs to approve.
I agree. The original proposal is already an improvement over what we have now, even if it doesn't fix everything. Truth being told, I don't think a general solution that works in every case can be found for this anyway.
A good evidence to convince core devs to change the current behavior is a couple of real code examples (it would be great that they are in CPython repo, but other popupar repos work too).
Let's leave the change in behavior out for now. I'm perfectly fine with the original bug fix proposal. While I believe that the other way around would be better, I also understand perfectly well that such a backward incompatible change in behavior is very unlikely to happen and even if it does, it would probably take years of discussions.
I'm pretty confident that my proposal earlier can get through - that's an enhancement to existing behavior (or a bug fix). If you want something more, you probably need more.
No, I think it would be easier and more expedient to leave the "more" out for now and just go for the bug fix.
Thank you. I really appreciate the expedite handling.
May I ask what is the intention with this bug fix? Will it be backported, or is it just going to exist in 3.12 going forward?
This is considered a bug fix I believe so it was fixed in main
and 3.12
(notice that 3.12
is already a backport as we are on 3.13 alpha now). Not sure why it was not merged back to 3.11
, that's a question for @carljm . 3.11
is still taking bug fixes right?
We won't port it back further because 3.10
only takes security fix now. So the only version that might be affected at this point is 3.11
.
That's fine. 3.11 is what I'm interested in. If this could land there it would be great.
Sorry, that was my oversight; it should be backported to 3.11. Kicked that off now.
Looks like it does not backport cleanly. I will try to get to the manual backport soon but it may be a few days. @gaogaotiantian if you want to prepare the backport PR sooner (using the cherry_picker tool) I will be happy to review and merge it.
CPed in https://github.com/python/cpython/pull/105274 @carljm
Bug report
With the following class hierarchy:
help(A2)
shows the wrong signature for instantiating A2 asA2(*args, **kw)
instead of the expectedA2(a, b)
, despite the fact that is shows the correct signature for__init__
:Note that
help(A1)
works correctly and shows the correct signature asA1(a, b)
:This doesn't seem to be an issue if
__new__
is not defined on A0, or if A1 redefines__new__
with the same signature as__init__
.Your environment
Linked PRs