PEP 649: Avoid creation of function objects for `__annotate__`

JelleZijlstra commented 1 month ago

Currently, when a class, function, or module has any annotations, we always generate an __annotate__ function object at import time. A function object takes 168 bytes. But in most cases, all of the relevant fields on an __annotate__ function are predictable (there's no docstring and no defaults or kwdefaults, the name is __annotate__, etc.). So we could save significant memory by constructing only a smaller object and constructing the function on demand when somebody asks for it (by accessing __annotate__).

We need the following to create an __annotate__ function object:

The code object itself. That's inescapable.
The globals dict. For function annotations, we can reuse the function's globals. For module annotations, we can use the module dict. But for classes, the __annotate__ descriptor can't easily get to the globals dict. To do this, we may need a new bytecode that just loads the current globals.
The closure tuple. Module annotations never have this, classes always have it (a reference to the classdict), functions often have it (always for methods, never for global functions, often for nested functions).

I am thinking of a format where __annotate__ can be any of the following:

A function, like today
A bare code object
A tuple containing a code object at position 0, optionally a globals dict at position 1, plus any number of cell objects

__annotate__ getters would have to recognize the second and third cases and translate them into function objects on the fly. As a result, users accessing .__annotate__ would never see the tuple, though those who peek directly into a module or class's __dict__ might.

Other related opportunities for optimization:

Tools like functools.wraps would unnecessarily force materialization of the __annotate__ function. Not sure there's an elegant solution for this.
The function objects created for various PEP 695/696 objects (e.g., TypeVar bounds) work very similarly to annotate functions, and we could apply the same optimization to them.
A code object by itself is also pretty big (232 bytes), and many of its fields are not needed for an annotate function that may never get executed. We could internally create a more streamlined "mini-codeobject" and materialize the real code object only when necessary.

JelleZijlstra commented 1 month ago

We could internally create a more streamlined "mini-codeobject" and materialize the real code object only when necessary

Maybe a variation of this could be that we create a bytestring with the marshalled code object, and unmarshal it only on demand.

JelleZijlstra commented 1 month ago

Tools like functools.wraps would unnecessarily force materialization of the annotate function. Not sure there's an elegant solution for this.

The proposal in #124342 would actually fix this, because we make it so the wrapper accesses the wrapped function's .__annotate__ lazily.

python / cpython

PEP 649: Avoid creation of function objects for `annotate` #124157

python / cpython

PEP 649: Avoid creation of function objects for `__annotate__` #124157

PEP 649: Avoid creation of function objects for `annotate` #124157