python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.4k forks source link

dataclasses: Allow typing.Annotated to wrap dataclasses-specific annotations #90669

Open 3f3f69a9-eb63-42d7-a515-cb2fea57e654 opened 2 years ago

3f3f69a9-eb63-42d7-a515-cb2fea57e654 commented 2 years ago
BPO 46511
Nosy @ericvsmith, @JelleZijlstra, @GBeauregard
PRs
  • python/cpython#30997
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library', '3.9', '3.10', '3.11'] title = 'dataclasses: Allow typing.Annotated to wrap dataclasses-specific annotations' updated_at = user = 'https://github.com/GBeauregard' ``` bugs.python.org fields: ```python activity = actor = 'GBeauregard' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'GBeauregard' dependencies = [] files = [] hgrepos = [] issue_num = 46511 keywords = ['patch'] message_count = 9.0 messages = ['411567', '411614', '411637', '411709', '411710', '411945', '411948', '412033', '413094'] nosy_count = 3.0 nosy_names = ['eric.smith', 'JelleZijlstra', 'GBeauregard'] pr_nums = ['30997'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue46511' versions = ['Python 3.9', 'Python 3.10', 'Python 3.11'] ```

    3f3f69a9-eb63-42d7-a515-cb2fea57e654 commented 2 years ago

    In https://bugs.python.org/issue46491 the typing runtime behavior was changed so that Annotated[Classvar[...]] is now valid at runtime in order to alleviate tension between typing and non-typing annotation space uses. dataclasses.py should likely follow suit in its runtime use of ClassVar and InitVar.

    Reviewing the code I see two areas that would need addressed:

    1) InitVar needs changed so Annotated[InitVar[...]] is no longer a runtime error. This is currently a runtime error completely by accident: typing.py expects special type forms to be callable(), usually by implementing a __call__ that throws an error, but InitVar does not implement this. Adding an implementation like in typing.py would fix the runtime error: https://github.com/python/cpython/blob/b1a3446f077b7d56b89f55d98dadb8018986a3e5/Lib/typing.py#L391-L392

    2) The dataclasses-specific typehint introspection implementation needs modified to accommodate being wrapped by an Annotated annotation. I see in the comments the code is performance sensitive so I'm not sure what you want to do; f.ex. the regex needs modified, but it's not clean.

    What are your thoughts?

    JelleZijlstra commented 2 years ago

    Related: bpo-44799 about InitVar.

    ericvsmith commented 2 years ago

    My thoughts are that I'd like PEP-563 to go away, and PEP-649 to be accepted, and also never allow string literal annotations like the string "Annotated[ClassVar[int]]". But since we'll no doubt have to support string-ized annotations even if PEP-649 is accepted, that's a pipe dream.

    I think your suggestion for #1 seems reasonable.

    For #2, in the case where typing has been imported and the annotation isn't a string, I assume it's simple enough to look inside the Annotated object and extract the InitVar (or ClassVar) object. I haven't delved in to Annotated object yet.

    For #2, with a string annotation, it does look like it will get ugly. I'll have to spend some time looking at _MODULE_IDENTIFIER_RE. I'm guessing we'll need another re that looks for the same basic thing with "Annotated[" or "module.Annotated[" prepended and then look inside that. Or maybe one re to do both. I don't think we should support cases like:

    from __future__ import annotations
    myAnnotated = typing.Annotated
    
    @dataclass
    class Foo:
        a: myAnnotated[ClassVar[int]]

    (That is, we won't recognize the string "myAnnotated[ClassVar[int]]" as an Annotated ClassVar.

    Maybe we should also restrict it to "Annotated" or "typing.Annotated", but that would prevent someone from using "import typing as _typing", for example. This is why the current code accepts any module name, not strictly "typing". At least that's my recollection, more study is needed.

    Or maybe it's time to give up and use typing.get_type_hints() or inspect.get_annotations(), but I suspect there are enough corner cases that will fail that we'll never get that to work right. Nested classes, anyone?

    The whole runtime inspection of string-ized annotations is a mess.

    On the "performance is important" comment: I'm not sure this is really an issue any more. There was some PEP that was supposed to speed up importing typing, and I never looked at the performance once it was merged. But then again, I'm not sure we want to always have dataclasses import typing, either. If a program doesn't use dataclasses that using the typing module, there's no sense importing it and enlarging the working set of modules.

    I welcome any insights on any of these issues. I'm not a typing expert.

    3f3f69a9-eb63-42d7-a515-cb2fea57e654 commented 2 years ago

    Thanks for getting back so quickly.

    Annotated is set up to be 'transparent' by default to typing.get_type_hints so in the case of using typing.py it can be made straightforward by chaining with typing.get_origin, I think.

    I don't see any reasonable way to do the regex that allows Annotated to be renamed, so I agree your suggested restriction. I think the regex would be something like this:

    ^\s(?:(?:\w+\s\.)?\s*Annotated\[)?(?:\s(\w+)\s\.)?\s*(\w+)

    I'm a bit worried people who are into Annotated annotations might be concerned about line length and more likely to rename Annotated, but I don't know if this is a realistic concern. And a part of me wants to say always importing typing.py should be okay since dataclasses was designed to work with type hints after all.

    On the other hand, I'm also a bit worried that if we made dataclasses always import typing.py it would rub people the wrong way even if we profiled it and decided it was okay.

    I'm going to spend some more time digesting the code tomorrow and try to decide if there's any major speed bumps to a full typing approach. I'll also look at import time and such.

    3f3f69a9-eb63-42d7-a515-cb2fea57e654 commented 2 years ago

    Or rather,

    ^\s(?:(?:\w+\s\.)?\s*Annotated\s\[)?(?:\s(\w+)\s\.)?\s(\w+)

    3f3f69a9-eb63-42d7-a515-cb2fea57e654 commented 2 years ago

    Hi Eric,

    to follow up on https://bugs.python.org/msg411943

    I'm currently a bit negative on moving to get_type_hints, even though I got it working for the test suite. I think your worries with nesting are well placed, particularly with namespaces and such.

    In that vein, I suggest we move forward with patching the existing implementation with the discussed regex restrictions. I'm not sure if you want to remove the test cases with leading spaces; it seems not too important.

    While we're there I found a bug in the test suite, a missing comma that can be fixed at the same time: https://github.com/python/cpython/blob/b1a3446f077b7d56b89f55d98dadb8018986a3e5/Lib/test/test_dataclasses.py#L3080

    Do you have any other concerns before I take a stab at this?

    ericvsmith commented 2 years ago

    I was hoping to wait until the PEP-649 / PEP-563 thing was decided. But I realize that no matter how that turns out, there will be a need to deal with string annotations.

    So I think I'm okay with the regex changes. Personally, I think we should remove support for leading spaces and should remove the tests, too. I guess there's some argument that there should be a deprecation period, I think it's just invalid syntax and shouldn't be supported.

    Go ahead and put together a PR.

    3f3f69a9-eb63-42d7-a515-cb2fea57e654 commented 2 years ago

    I had a few style, approach, and testing preference questions, but I decided they're probably best addressed in a code review so I went ahead and posted the PR.

    3f3f69a9-eb63-42d7-a515-cb2fea57e654 commented 2 years ago

    It occurred to be that we do need to add the __call__ to KW_ONLY, but for a different reason than this bpo:

    If you call get_type_hints on a dataclass with a KW_ONLY parameter when PEP-563 is enabled, the entire call will fail if KW_ONLY isn't callable(). This can also happen if you stringize KW_ONLY without PEP-563. I made a bpo to suggest removing the callable() check entirely, but it's waiting discussion currently: https://bugs.python.org/issue46644

    My feeling is you probably wanted to wait on making changes of this kind for the 5.11 PEP-563 et al decision to play out, but on the other hand I think the KWONLY (and similar \_call__ method for InitVar this patch already adds) change would likely be backportable so we may want to make them anyway for that purpose. Do you have an opinion on this?

    This patch might be backportable to mirror https://bugs.python.org/issue46491 but I don't have a strong opinion on that.