mkdocstrings / griffe

Signatures for entire Python programs. Extract the structure, the frame, the skeleton of your project, to generate API documentation or find breaking changes in your API.
https://mkdocstrings.github.io/griffe
ISC License
283 stars 40 forks source link

feature: Support `__new__` signatures for classses #320

Open Viicos opened 2 weeks ago

Viicos commented 2 weeks ago

Is your feature request related to a problem? Please describe.

The Class.parameters property uses __init__ by default:

https://github.com/mkdocstrings/griffe/blob/58eb9f4681c6ab7398750c6f1fa1f00db92c6456/src/_griffe/models.py#L1605-L1615

It would be nice if __new__ could be supported as well. Behavior when both are defined should probably be considered. The typing specification regarding constructors might help in defining a strategy.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

pawamoy commented 2 weeks ago

Thanks for the request. Could you give a concrete example where this would be needed/useful?

Viicos commented 2 weeks ago

For instance: https://github.com/pydantic/pydantic-core/pull/1420/files#diff-7e296e0875fa785fb0d0f6904dc84ce9145cdd30e12b96ef6cb37d48752135e1R77.

But more generally, people can define __new__ methods without providing any __init__. Pushing things a bit more, the metaclass' __call__ is the one used first and could be supported as well. The typing specification relies on the runtime behavior, so that's why I linked it as a reference.

The logic to determine the correct signature can get overly complicated. Maybe what could be done for now (and to avoid introducing breaking changes):

If no __init__ is defined (i.e. __init__ is object.__init__ iirc), fallback to a __new__.

This does not reflect the runtime behavior but is probably the past of least resistance here. Wdyt? Happy to push a PR.

pawamoy commented 2 weeks ago

Thanks :slightly_smiling_face:

Sounds like this can quickly become super complicated indeed. How does runtime behave with class inheritance and metaclasses :dizzy_face:?

From the docs you linked ( :pray: ):

At runtime, a call to a class’ constructor typically results in the invocation of three methods in the following order:

  1. The __call__ method of the metaclass (which is typically supplied by the type class but can be overridden by a custom metaclass and which is responsible for calling the next two methods)
  2. The__new__ static method of the class
  3. The __init__ instance method of the class

Do I understand correctly that when doing MyClass(...), first the closest metaclass __call__ method (according to MRO) is called, then the closest __new__ method is called, then the closest __init__ method is called?

pawamoy commented 2 weeks ago

What about the following?

class A:
    def __init__(self, ...): ...

class B(A):
    def __new__(cls, ...): ...

class C(B): ...

Should we use the parameters from B.__new__ or from A.__init__? :thinking:

pawamoy commented 2 weeks ago
>>> class A:
...     def __init__(self, a, b): ...
... 
>>> class B(A):
...     def __new__(cls, c, d): ...
... 
>>> b = B()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: B.__new__() missing 2 required positional arguments: 'c' and 'd'
pawamoy commented 2 weeks ago

Looks like __new__ always takes precedence if defined:

>>> class A:
...     def __new__(cls, a, b): ...
... 
>>> class B(A):
...     def __init__(self, c, d): ...
... 
>>> b = B()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: A.__new__() missing 2 required positional arguments: 'a' and 'b'

Fortunately that seems to make the change easy: simply try to get params from all_members["__new__"] first, then fallback to __init__, then return empty params.

Metaclasses can wait for now :see_no_evil:

Viicos commented 2 weeks ago

Note that default C implementations of __call__, __new__ and __init__ exist at runtime. The typing specification describes in which cases they should be skipped. That's why I also mentioned checking for __init__ is object.__init__.

Fortunately that seems to make the change easy: simply try to get params from all_members["__new__"] first, then fallback to __init__, then return empty params.

Looks good, however, I'm a bit afraid that this may break existing setups. It is common for classes to implement a "catch-all" __new__(*args, **kwargs) to do some special logic, while still having the actual arguments described and typed in __init__. That's why I suggested having a non breaking approach first.


Metaclasses can wait for now

I don't expect anyone to request support for it anyway. Mypy doesn't even support it and it is very uncommon to implement type.__call__.

pawamoy commented 2 weeks ago

Note that default C implementations of call, new and init exist at runtime. The typing specification describes in which cases they should be skipped. That's why I also mentioned checking for init is object.init.

The thing is, our logic happens after visiting AST or inspecting runtime objects. At that point we don't see any of the default C implementations. We only see what was declared in the loaded modules.

It is common for classes to implement a "catch-all" new(*args, **kwargs) to do some special logic, while still having the actual arguments described and typed in init.

I'm not sure to understand. If you define both __new__ and __init__, only __new__ is called, so why describing parameters in __init__, which isn't called?

>>> class A:
...     def __new__(cls, *args, **kwargs):
...         print("in new")
...         print(args, kwargs)
...     def __init__(self, *args, **kwargs):
...         print("in init")
...         print(args, kwargs)
... 
>>> a = A()
in new
() {}

EDIT: ah, unless we're supposed to explicitly call __init__ from __new__ (as you can see I'm not at all comfortable with __new__ :sweat_smile:)

pawamoy commented 2 weeks ago

Ah, nope:

If new() is invoked during object construction and it returns an instance of cls, then the new instance’s init() method will be invoked like init(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to the object constructor.

pawamoy commented 2 weeks ago

So, yeah, it seems to make sense to use the __init__ signature when it is defined, and the __new__ one otherwise. But that now complicates the logic a bit:

Not even sure about the last two points, as maybe we should loop on the first two while following the MRO...

All this to support the case where devs don't want to duplicate correct signature from __init__ into __new__ :thinking:? But yeah, backward compatibility :sweat_smile:

pawamoy commented 2 weeks ago

What about the following:

class A:
    def __init__(self, param1: str): ...

class B(A):
    def __new__(cls, *args, **kwargs):
        # some logic
        return super().__new__(cls)

Surely users would like the parameters data to be fetched from A.__init__ and not B.__new__?

Sounds like a case where we won't be able to satisfy everyone. I'm wondering if this should live in an extension, but I'm not sure the extension system will support it in its current state.

Viicos commented 2 weeks ago

I'm not sure to understand. If you define both __new__ and __init__, only __new__ is called, so why describing parameters in __init__, which isn't called?

Usually you call super().__new__ in your user defined __new__ (which will probably be the default C implementation. I can't recall if objec.__new__ is the one responsible for calling __init__, or if this is done by type.__call__ instead).

Sounds like a case where we won't be able to satisfy everyone.

We can check how Sphinx does it. I'm not entirely sure about the current logic, but PR can be found here: https://github.com/sphinx-doc/sphinx/pull/7384. It supports metaclass' __call__, but of course can be skipped.

pawamoy commented 2 weeks ago

Awesome, thanks for the link!

Looks like they give precedence to __new__, like runtime behavior:

        # Now we check if the 'obj' class has a '__new__' method
        new = get_user_defined_function_or_method(self.object, '__new__')
        if new is not None:
            self.env.app.emit('autodoc-before-process-signature', new, True)
            try:
                return inspect.signature(new, bound_method=True)
            except ValueError:
                pass

        # Finally, we should have at least __init__ implemented
        init = get_user_defined_function_or_method(self.object, '__init__')
        if init is not None:
            ...

If neither __new__ or __init__ are "user-defined", they rely on inspect.getsignature (sphinx-autodoc almost only uses dynamic analysis IIRC).

In Griffe we just need to decide whether we give precedence to __new__ or __init__. I'd be inclined to give precedence to __new__ (like sphinx-autodoc), because it matches runtime behavior, makes for an easy update, and still lets projects that declare __new__(cls, *args, **kwargs) fix their signature (or, well, forces them to do so :smile:). Also, it would match the concrete use-case that is described here (the classes/stubs in Pydantic linked above).

I'd be interested to see how common __new__(cls, *args, **kwargs) is. Do you have examples of popular projects that do that?

Viicos commented 2 weeks ago

I'd be interested to see how common __new__(cls, *args, **kwargs) is. Do you have examples of popular projects that do that?

Couldn't find anything relevant in a Github search. But maybe this isn't too much of a big deal, seems like Sphinx introduced the functionality as a non breaking change and no one complained about it :)