python / cpython

The Python programming language
https://www.python.org
Other
63.16k stars 30.24k forks source link

Can't assign entry to `__annotations__` unless a class has existing annotation #95532

Open ravwojdyla opened 2 years ago

ravwojdyla commented 2 years ago

Bug report

Can't assign entry to __annotations__ unless a class has existing annotation. This is related to https://github.com/python/cpython/issues/88067.

python 3.9:

Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 20:33:18)
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo:
...     __annotations__["b"] = int
...
>>> Foo.__annotations__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'Foo' has no attribute '__annotations__'
>>> class Foo:
...     a: int
...     __annotations__["b"] = int
...
>>> Foo.__annotations__
{'a': <class 'int'>, 'b': <class 'int'>}

python 3.10/python 3.11-rc:

Python 3.10.5 (main, Jul 12 2022, 11:32:11) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo:
...     __annotations__["a"] = int
...
>>> Foo.__annotations__
{}
>>> class Foo:
...     a: float
...     __annotations__["b"] = int
...
>>> Foo.__annotations__
{'a': <class 'float'>, 'b': <class 'int'>}

Your environment


I'm interested in submitting a PR for this, but first would like to double check that such PR/change would be acceptable?

ericvsmith commented 2 years ago

What's the use case for this? I'm not sure we really want to support adding items to __annotations__ directly.

ravwojdyla commented 2 years ago

šŸ‘‹ @ericvsmith we have a schema definition framework built atop pydantic, and directly adding items to __annotations__ makes it possible to easily "copy" fields between schema classes, we utilise this a lot. This use case works fine for us, unless the schema class doesn't have any existing fields (as illustrated in this issue). I'm happy to describe this in more details if necessary.

I'm not sure we really want to support adding items to __annotations__ directly.

From PEP-0526:

__annotations__ is writable, so this is permitted: __annotations__['s'] = str

ericvsmith commented 2 years ago

Thanks, @ravwojdyla

Man, the formatting of that PEP section is messed up!

I assume you're saying you like the 3.10/3.11 behavior (which doesn't fail), and want it backported to 3.9? 3.9 is only getting security fixes now.

ravwojdyla commented 2 years ago

I assume you're saying you like the 3.10/3.11 behavior (which doesn't fail), and want it backported to 3.9? 3.9 is only getting security fixes now.

@ericvsmith I think the 3.10/11 is an improvement from 3.9, but the bug here is that you can't assign item to __annotations__ unless there's already existing annotation in the class, which is inconsistent.

So from the example above:

>>> class Foo:
...     __annotations__["b"] = int
...
>>> Foo.__annotations__
{}

>>> # BUG: assignment did not work ^ where is the `b` field?

>>> class Foo:
...     a: float
...     __annotations__["b"] = int
...
>>> Foo.__annotations__
{'a': <class 'float'>, 'b': <class 'int'>}

>>> # NOTE: now the assignment worked for both `a` and `b`, only because there was existing `a` field

Does it make sense?

ericvsmith commented 2 years ago

Ah. I missed that the returned dict was empty in the first 3.10 example.

@larryhastings thoughts?

ravwojdyla commented 2 years ago

Ah, I understand now where does the b field go in this scenario:

>>> class Foo:
...     __annotations__["b"] = int
...
>>> Foo.__annotations__
{}

>>> # BUG: where is `b`?

See:

Python 3.12.0a0 (heads/fix-issue-95532-dirty:698fa8bf60, Aug  2 2022, 09:34:18) [Clang 12.0.0 (clang-1200.0.32.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> __annotations__
{}
>>> # module __annotations__ is empty - so far good

>>> class Foo:
...     __annotations__["c"] = float
...
>>> Foo.__annotations__
{}
>>> __annotations__
{'c': <class 'float'>}

>>> # `c` went into the module __annotations__, but see what happens next:

>>> class Bar:
...     a: int
...     __annotations__["b"] = float
...
>>> __annotations__
{'c': <class 'float'>}
>>> Bar.__annotations__
{'a': <class 'int'>, 'b': <class 'float'>}

>>> # both `a` and `b` are now in the the Bar class __annotations__

So I would say in the Foo example, __annotations__ should point to Foo.__annotations__ not the module's __annotations__.

Also note that this works as expected:

>>> class Baz:
...     __annotations__["c"] = float
...     a: int
...     __annotations__["b"] = float
...
>>> Baz.__annotations__
{'c': <class 'float'>, 'a': <class 'int'>, 'b': <class 'float'>}
>>> __annotations__
{}

So it seems like the existence of the a annotated field in the Baz class is somehow binding __annotations__ to the class.

ravwojdyla commented 2 years ago

And this issue is reproducible with the 3.6 PEP-0526 implementation https://github.com/python/cpython/issues/72172. Do we agree that __annotations__ in the Foo example should bind to Foo.__annotations__? Would be happy to try to PR a fix for that.

ravwojdyla commented 2 years ago

šŸ‘‹ @ericvsmith what's the best way to get feedback/guidance for this issue and question above?

ericvsmith commented 2 years ago

@ravwojdyla : Maybe mention it on discuss.python.org, pointing to this issue.

ravwojdyla commented 2 years ago

Posted here: https://discuss.python.org/t/request-for-guidance-on-gh-95532/18124

terryjreedy commented 2 years ago

This strikes me as a (supprising) bug. Any assignments in class scope scope to a writable name should be in the class namespace unless a global declaration says otherwise.

When you find the code that does this, ping the author (found using git blame or git log) either here or on a PR.

sobolevn commented 2 years ago

Very interesting! Let's dive into it.

>>> import dis
>>> def first():
...   class Foo:
...     a: int
...     __annotations__['b'] = int
... 
>>> def second():
...   class Foo:
...     __annotations__['b'] = int
... 

Let's see why they are different:

>>> dis.dis(first)
  1           0 RESUME                   0

  2           2 PUSH_NULL
              4 LOAD_BUILD_CLASS
              6 LOAD_CONST               1 (<code object Foo at 0x104d0cd40, file "<stdin>", line 2>)
              8 MAKE_FUNCTION            0
             10 LOAD_CONST               2 ('Foo')
             12 CALL                     2
             22 STORE_FAST               0 (Foo)
             24 LOAD_CONST               0 (None)
             26 RETURN_VALUE

Disassembly of <code object Foo at 0x104d0cd40, file "<stdin>", line 2>:
  2           0 RESUME                   0
              2 LOAD_NAME                0 (__name__)
              4 STORE_NAME               1 (__module__)
              6 LOAD_CONST               0 ('first.<locals>.Foo')
              8 STORE_NAME               2 (__qualname__)
             10 SETUP_ANNOTATIONS

  3          12 LOAD_NAME                3 (int)
             14 LOAD_NAME                4 (__annotations__)
             16 LOAD_CONST               1 ('a')
             18 STORE_SUBSCR

  4          22 LOAD_NAME                3 (int)
             24 LOAD_NAME                4 (__annotations__)
             26 LOAD_CONST               2 ('b')
             28 STORE_SUBSCR
             32 LOAD_CONST               3 (None)
             34 RETURN_VALUE

Note 10 SETUP_ANNOTATIONS

>>> dis.dis(second)
  1           0 RESUME                   0

  2           2 PUSH_NULL
              4 LOAD_BUILD_CLASS
              6 LOAD_CONST               1 (<code object Foo at 0x104cedd50, file "<stdin>", line 2>)
              8 MAKE_FUNCTION            0
             10 LOAD_CONST               2 ('Foo')
             12 CALL                     2
             22 STORE_FAST               0 (Foo)
             24 LOAD_CONST               0 (None)
             26 RETURN_VALUE

Disassembly of <code object Foo at 0x104cedd50, file "<stdin>", line 2>:
  2           0 RESUME                   0
              2 LOAD_NAME                0 (__name__)
              4 STORE_NAME               1 (__module__)
              6 LOAD_CONST               0 ('second.<locals>.Foo')
              8 STORE_NAME               2 (__qualname__)

  3          10 LOAD_NAME                3 (int)
             12 LOAD_NAME                4 (__annotations__)
             14 LOAD_CONST               1 ('b')
             16 STORE_SUBSCR
             20 LOAD_CONST               2 (None)
             22 RETURN_VALUE

Note that 10 SETUP_ANNOTATIONS is not present. Why? Because find_ann function is looking for AnnAssign_kind statement kind to actually set annotations up.

The simpliest option I see is to always call SETUP_ANNOTATIONS for class bodies, similar to how we treat __qualname__, __name__, and __module__.

The downside of this solution is that we slow down all classes even without any annotations. How bad is it? I don't know: I will need to write a benchmark.

Other ideas?

ravwojdyla commented 2 years ago

Nice @sobolevn!

Btw https://docs.python.org/3/library/dis.html#opcode-SETUP_ANNOTATIONS

This opcode is only emitted if a class or module body contains variable annotations statically.

Regarding Terry request:

When you find the code that does this, ping the author (found using git blame or git log) either here or on a PR.

@ilevkivskyi FYI ^

ilevkivskyi commented 2 years ago

Yeah, IIRC we already understood there may be downsides when we decided to conditionally create __annotations__. It's totally OK to always create __annotations__ if it turns out having it defined is more important than the (modest) memory savings. (Also having class scope one bound to global one is definitely a bug, probably something is missing in symbol table analysis).

sobolevn commented 2 years ago

Also having class scope one bound to global one is definitely a bug, probably something is missing in symbol table analysis

@ilevkivskyi to clarify, are you talking about __annotations__ being resolved to a global variable inside a class scope (when __annotations__ is not defined), like:

>>> a = {}
>>> class Some:
...    a['a'] = int
... 
>>> a
{'a': <class 'int'>}

?

ravwojdyla commented 2 years ago

@sobolevn I believe @ilevkivskyi is talking about this case:

Python 3.12.0a0 (heads/fix-issue-95532-dirty:698fa8bf60, Aug  2 2022, 09:34:18) [Clang 12.0.0 (clang-1200.0.32.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> __annotations__
{}
>>> # module __annotations__ is empty - so far good

>>> class Foo:
...     __annotations__["c"] = float
...
>>> Foo.__annotations__
{}
>>> __annotations__
{'c': <class 'float'>}

See above: https://github.com/python/cpython/issues/95532#issuecomment-1202143016, this type of assignment to __annotations__ works if __annotations__ already exists at the class scope.

sobolevn commented 2 years ago

@ravwojdyla this is exactly the case I've showed in https://github.com/python/cpython/issues/95532#issuecomment-1238521616 using a = {} as more general case.

ravwojdyla commented 2 years ago

@sobolevn understood, was just pointed to the example @ilevkivskyi was referring to in the comments above. Sounds like @ilevkivskyi is ok with just creating __annotations__ at all times, which will resolve both problems highlighted in this issue. In that case unless there's any concerns will go ahead and try to PR that(?)

sobolevn commented 2 years ago

I think it is better to let a person to speak for themself :) Let's wait for some more information before diving into it šŸ˜‰

ilevkivskyi commented 2 years ago

I think I misunderstood, I thought there was some case where it was bound to global one while class-level one is also defined. But it looks like the class-level one is generated lazily when accessed. IIUC this was added later (found it in https://github.com/python/cpython/pull/25623), while in 3.6 accessing __annotations__ on a class without annotations raised attribute error. So one would have:

>>> class C:
...     __annotations__["x"] = int
... 
>>> C.__annotations__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'C' has no attribute '__annotations__'
>>>

This was consistent with how any other name (like a above) would behave in this case. But with lazy generation of __annotations__ this now becomes confusing.

ravwojdyla commented 2 years ago

Also FYI https://github.com/python/cpython/issues/88067#issuecomment-1093911708 and https://github.com/python/cpython/issues/88067#issuecomment-1093911732