mkdocstrings / griffe

Signatures for entire Python programs. Extract the structure, the frame, the skeleton of your project, to generate API documentation or find breaking changes in your API.
https://mkdocstrings.github.io/griffe
ISC License
317 stars 44 forks source link

bug: Descriptors are not handled properly #328

Open pekkaklarck opened 1 month ago

pekkaklarck commented 1 month ago

If I use a descriptor as at the end of this description, there are two problems with how the generated documentation for Example.count looks like:

  1. There's no type information. It would be possible to get that from the __get__ method.
  2. The descriptor gets labels class-atribute and instance-atribute which are both somewhat wrong. Technically it is a class attribute, but so are methods and propertys (that are also implemented as descriptors) and they don't get such a label either. Technically it's not an instance attribute, but from the usage perspective it acts like one. I believe a proper label would be data-descriptor, but property would probably be more familiar for normal users and the functionality is the same.

You can see how the example is rendered here in a demo site that I've used for MkDocs experiments.

A variation of this is using a descriptor as a method decorator like the setter we use extensively with the Robot Framework project to avoid using normal propertys with dummy getters. Such attributes are currently shown as methods without any type information. You can see an example here in a site where we are creating our new Manual that will also incorporate API docs.

from typing import overload, Self

class PositiveInteger[T]:

    def __set_name__(self, owner: T, name: str):
        self.name = '_' + name

    @overload
    def __get__(self, instance: None, owner: type[T]) -> Self:
        ...

    @overload
    def __get__(self, instance: T, owner: type[T]) -> int:
        ...

    def __get__(self, instance, owner) -> int | Self:
        if instance is None:
            return self
        return getattr(instance, self.name)

    def __set__(self, instance: T, value: int):
        if value <= 0:
            raise ValueError(f'{value} is not positive')
        setattr(instance, self.name, value)

class Example:
    """Example using descriptor."""

    count = PositiveInteger()
    """Count as a positive integer."""

    def __init__(self, count: int = 1):
        self.count = count

I'm not sure should this have been submitted to griffe instead or should I have submitted separate issues about different problems. Feel free to move or split as needed.

Boost priority

Fund with Polar

pawamoy commented 1 month ago

(I'll move this to Griffe)

Thanks for the report!

There's no type information. It would be possible to get that from the get method.

Hmmm. I don't think I want to enter this territory. Griffe has some capabilities to follow annotations, but inferring them is currently a non-goal. Even looking at the code, it took me a few seconds to know what the type would be (int, IIUC? I have to admit I'm not familiar with descriptors :sweat_smile:).

If count is therefore supposed to be an int, then I'd recommend annotating it as such. I know it's not ideal, when type-checkers are able to infer the type, but as mentioned on Gitter/Matrix, Griffe is not a type-checker and doesn't try to be one, so it expects that users provide the relevant information. Do you think that makes sense?

The descriptor gets labels class-atribute and instance-atribute [...] from the usage perspective it acts like one.

Griffe indeed focuses on the usage perspective, though it's true that sometimes it can provide info that is not relevant to usage (like, a writable property just becomes a regular attribute, by usage, so why showing the labels, right). In that case, it considers count to be both a class-attribute and instance-attribute because it's defined both at the level class and in the __init__ method. Since I'm not familiar with descriptors: can it actually be used both with Example.count and example.count? In any case, the object should definitely have a data-descriptor label, for exploitation purposes.

To support your setter, you'll have to write an extension :slightly_smiling_face:

pekkaklarck commented 1 month ago

I don't think it would be unrealistic to get the correct type from the __get__ method. It would then work with all descriptors, including property that is a descriptor but many tools (including Griffe I assume) just handle is as a special construct. I have no idea how easy this would be, but I think it would be a good goal.

You can access descriptors both via the class and via the instance, but you get different results. In the former case you get the descriptor itself, but when you access it from the instance you get the value returned but __get__. Here's an example using the custom PositiveInteger description as well as a property and a method that are also descriptors:

>>> class Example:
...     d = PositiveInteger()
...     def __init__(self):
...         self.d = 1
...     @property
...     def p(self):
...         return 2
...     def m(self):
...         return 3
...         
>>> e = Example()
>>> e.d
1
>>> e.p
2
>>> e.m
<bound method Example.m of <__main__.Example object at 0x7f2780d8a3c0>>
>>> Example.d
<PositiveInteger object at 0x7f2780e29bd0>
>>> Example.p
<property object at 0x7f2780c81a80>
>>> Example.m
<function Example.m at 0x7f2780c7d8a0>

If you want to learn more about descriptors, I recommend reading the excellent Descriptor Guide. Descriptors aren't used that widely in normal code, but Python itself uses them extensively. I'm sure their usage gets more widespread in the future, though. For example, I read somewhere that SQLAlchemy has started to used from in its models.

I would expect that our setter could be handled as a normal descriptor without a dedicated extension. It's easy to recognize that an attribute is a descriptor with hasattr(attr, '__get__') and from there it ought to be possible to handle all of them the same way. I'm interested to look at that myself, but unfortunately I don't have time for that in the near future.

pekkaklarck commented 1 month ago

Few more examples using the Example class in the above comment:

>>> Example.d.__get__(e, type(e))     # Standard descriptor call requires `instance` and `type`.
1
>>> Example.d.__get__(None, type(e))  # `None` as `instance` is same as accessing from class.
<descriptor.PositiveInteger object at 0x7f2780e29bd0>
>>> Example.p.__get__(e)              # With property `type` is optional.
2
>>> Example.m.__get__(e)()            # Somewhat strange way to call a method...
3
>>> from typing import get_type_hints
>>> get_type_hints(Example.d.__get__)
{'return': typing.Union[int, typing.Self]}

As the end of the example demonstrates, getting the type from __get__ is easy. There are two problems, but both ought to be relatively easy to handle:

pawamoy commented 1 month ago

It would then work with all descriptors, including property that is a descriptor

I would expect that our setter could be handled as a normal descriptor without a dedicated extension. It's easy to recognize that an attribute is a descriptor with hasattr(attr, 'get')

You seem to assume dynamic analysis here, with access to runtime objects. But by default, Griffe uses static analysis. I'm sure you understand it's another story :smile: Griffe also supports dynamic analysis though, but I'm not sure we explicitly support data descriptors (we do support method descriptors).

pekkaklarck commented 1 month ago

It actually occurred to be, after writing the previous comments, that this isn't as straightforward with static analysis alone. I assume methods (that are non-data descriptors) work also in static analysis, so perhaps the same approach could also be extended to data descriptors. If that's too complicated, proper data descriptor support requiring dynamic analysis ought to be fine.

pawamoy commented 1 month ago

I believe we can (and should) improve support for data-descriptors through both static and dynamic analysis. Data descriptors are a feature of the language, so even if it's hard, Griffe must support them. Dynamic analysis should be able to handle them directly while visiting object trees, while static analysis will probably require a built-in extension. Anyway, these are just internal details :slightly_smiling_face:

What I note is:

After this, to support your setter you'll either have to rely on dynamic analysis, or write a Griffe extension as static analysis will still not be enough I believe.

pawamoy commented 1 month ago

Thanks for all the explanations, and for your patience :slightly_smiling_face: I've read the guide on data descriptors you shared, and I think I understand their main benefit: contrary to properties, they are easily reusable, and require much less boilerplate once implemented. Properties would have to shared via inheritance (mixins) instead of composition, and even then would be less flexible. Makes sense since properties are a specific use of descriptors.

pekkaklarck commented 1 month ago

Yeah, descriptors are really handy. They look complicated first, but users typically don't need to even know them. I think they are underused but believe that will change.

Your plans sound really good. The data-descriptor label may look strange to some, but that's anyway probably better that more familiar but not-exactly-right property. The situation could be enhanced by mkdocstring by linking labels to more information similarly as type hints to external types are. I'm not sure do all labels have a suitable target at python.org, though, or should their explanations be part of mkdocstring docs.

If you implement all that, I'd expect our setter to work out-of-the-box at least in dynamic analysis.