`__annotate__` does not get called if `__annotations__` exist

sobolevn commented 2 months ago

This code:

class A:
    x: int
    y: str

    def __annotate__(format):
        return {'a': bool}

import annotationlib
print(A.__annotations__)
print(annotationlib.get_annotations(A, format=annotationlib.Format.VALUE))
print(annotationlib.get_annotations(A, format=annotationlib.Format.FORWARDREF))
print(annotationlib.get_annotations(A, format=annotationlib.Format.SOURCE))

Produces:

{'x': <class 'int'>, 'y': <class 'str'>}
{'x': <class 'int'>, 'y': <class 'str'>}
{'x': <class 'int'>, 'y': <class 'str'>}
{'x': 'int', 'y': 'str'}

PEP 649 specifies this as: https://peps.python.org/pep-0649/#annotate-and-annotations

When o.__annotations__ is evaluated, and the internal storage for o.__annotations__ is unset, and o.__annotate__ is set to a callable, the getter for o.__annotations__ calls o.__annotate__(1), then caches the result in its internal storage and returns the result.

To explicitly clarify one question that has come up multiple times: this o.__annotations__ cache is the only caching mechanism defined in this PEP. There are no other caching mechanisms defined in this PEP. The __annotate__ functions generated by the Python compiler explicitly don’t cache any of the values they compute.

Comment them out:

class A:
    # x: int
    # y: str

    def __annotate__(format):
        return {'a': bool}

import annotationlib
print(A.__annotations__)
print(annotationlib.get_annotations(A, format=annotationlib.Format.VALUE))
print(annotationlib.get_annotations(A, format=annotationlib.Format.FORWARDREF))
print(annotationlib.get_annotations(A, format=annotationlib.Format.SOURCE))

Produces:

{'a': <class 'bool'>}
{'a': <class 'bool'>}
{'a': <class 'bool'>}
{'a': <class 'bool'>}

Let's discuss this.

picnixz commented 2 months ago

(Not sure if it's really a bug or not, so feel free to remove the label; but I think it should only affect 3.14)

JelleZijlstra commented 1 week ago

I'm going to say this is working the way it should be working. If a class has annotations, the compiler generates an __annotate__ function and puts it in the class dictionary. Therefore, in your first example, the __annotate__ function that you provided gets overridden.

In your second example, there are no annotations, so the compiler does not generate an __annotate__ function. But when you call the __annotations__ descriptor, it finds an __annotate__ function and it calls it.

So my recommendation is not to change anything here. We could add some tests covering these cases, though. And if anyone thinks this needs bigger discussion, I can add a section discussing these cases to PEP-749.

sobolevn commented 1 week ago

And if anyone thinks this needs bigger discussion

I think that this case should be discussed. As it removes the potential use-case for things that are defined using annotations DSL, like dataclasses, attrs, pydantic models, validation, etc.

Let's say:

class MyObject:
    field: 'heavy_import.Something'

    def __annotate__(self, format):
        import heavy_import
        return {'field': heavy_import.Something}

^^^

This won't work as well.

And this also seem like a big potential use-case, when people use TYPE_CHECKING imports for heavy objects, while they still want a way to properly resolve them at runtime when needed.

sobolevn commented 1 week ago

We might also need to invite people from pydantic and attrs to ask them, I am sure they have a lot of potential use-cases.

JelleZijlstra commented 1 week ago

Libraries like pydantic or attrs have code that can run after the class is defined (either __init_subclass__ or a decorator). I don't really see why they would want users to define their own __annotate__ function.

For your heavy_import example, users can already handle this particular case with if TYPE_CHECKING a lot more simply (they don't have to repeat all annotations).

sobolevn commented 1 week ago

For your heavy_import example, users can already handle this particular case with if TYPE_CHECKING a lot more simply (they don't have to repeat all annotations).

Yes, this is what I was thinking of :) I proposed the same here:

And this also seem like a big potential use-case, when people use TYPE_CHECKING imports for heavy objects, while they still want a way to properly resolve them at runtime when needed.

But, note that TYPE_CHECKING does not allow to have a real object set in annotation, only a string. Sometimes we might need both:

Having a real object, when .__annotations__ is accessed
Not running some potentially missing or slow imports before that

Libraries like pydantic or attrs have code that can run after the class is defined (either __init_subclass or a decorator). I don't really see why they would want users to define their own annotate__ function.

Because this will affect code paths that don't specifically ask for annotations. Basically, this is the same as importing heavy_import right away.

For real use-case:

We have string-based manipulations of ClassVar and Any in dataclasses.py
We don't want to import typing in dataclasses.py
I wanted to add def __annotate__ to dataclass instances, so we can replace 'Any' to typing.Any when .__annotations__ dict is accessed. But, I failed to do that, because we already have .__annotations__ defined due to the dataclass existing API.

So, I opened this issue.

My proposal: let's ask maintainers of these two libs at least, feedback is always great to have for such big features. If they don't have any problems, it would be great! :) If they do, we can address them early.

JelleZijlstra commented 1 week ago

But, note that TYPE_CHECKING does not allow to have a real object set in annotation, only a string

Not true, it can be a ForwardRef. annotationlib already provides a lot of flexibility for deferring heavy imports.

But, I failed to do that, because we already have .annotations defined due to the dataclass existing API.

This should already be possible by making the dataclass decorator set a different __annotate__ function. I'm not sure how allowing the __annotate__ function to be defined on the class would help.

let's ask maintainers of these two libs at least

Feel free to do so!

sobolevn commented 1 week ago

I'm clearly missing something.

This should already be possible by making the dataclass decorator set a different __annotate__ function.

Can you please provide a code sample for that?

JelleZijlstra commented 1 week ago

I'm not sure I fully understand what you want to do either. What I was thinking was that dataclasses could do something like:

original_annotate = annotationlib.get_annotate_function(cls)

def wrapper_annotate(format):
    annos = annotationlib.call_annotate_function(original_annotate, format)
    return {k: typing.Any if v == "Any" else v for k, v in annos.items()}

But I don't really see the point; people shouldn't be using string annotations in the first place.

JelleZijlstra commented 4 days ago

@sobolevn seems like you got #122262 to work without changes here; do you still think the behavior of the core needs to change?

sobolevn commented 3 days ago

Now I am more inclined to agree with the current implementation 👍 , but I think we can leave this open until we ask pydantic and attrs teams, just to be safe (working on this) :)

python / cpython

`annotate` does not get called if `annotations` exist #122285

python / cpython

`__annotate__` does not get called if `__annotations__` exist #122285

`annotate` does not get called if `annotations` exist #122285