python / cpython

The Python programming language
https://www.python.org
Other
62.6k stars 30.03k forks source link

Frozen dataclass deep copy doesn't work with __slots__ #89683

Open 4a93283c-c903-4f64-bca5-96978136d9a8 opened 2 years ago

4a93283c-c903-4f64-bca5-96978136d9a8 commented 2 years ago
BPO 45520
Nosy @ericvsmith, @MojoVampire, @jfuruness, @sobolevn
PRs
  • python/cpython#29147
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', '3.9', '3.10'] title = "Frozen dataclass deep copy doesn't work with __slots__" updated_at = user = 'https://github.com/jfuruness' ``` bugs.python.org fields: ```python activity = actor = 'sobolevn' assignee = 'none' closed = False closed_date = None closer = None components = [] creation = creator = 'jfuruness' dependencies = [] files = [] hgrepos = [] issue_num = 45520 keywords = ['patch'] message_count = 5.0 messages = ['404245', '404260', '404273', '404354', '404525'] nosy_count = 4.0 nosy_names = ['eric.smith', 'josh.r', 'jfuruness', 'sobolevn'] pr_nums = ['29147'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue45520' versions = ['Python 3.9', 'Python 3.10'] ```

    Linked PRs

    4a93283c-c903-4f64-bca5-96978136d9a8 commented 2 years ago

    If you define a frozen dataclass with slots and deep copy it, an error will occur. If you run the same code and remove the slots, the error will not occur. I assume this behavior is not intentional? Apologies if I'm submitting this wrong, this is the first time I've submitted an issue here so I'm not quite sure how to do it properly.

    Example below:

    from dataclasses import dataclass
    from copy import deepcopy
    
    @dataclass(frozen=True)
    class FrozenData:
        # Without slots no errors occur?
        __slots__ = "my_string",
    
        my_string: str
    
    deepcopy(FrozenData(my_string="initial"))

    Error that occurs:

    dataclasses.FrozenInstanceError: cannot assign to field 'my_string'
    99ffcaa5-b43b-4e8e-a35e-9c890007b9cd commented 2 years ago

    When I define this with the new-in-3.10 slots=True argument to dataclass rather than manually defining __slots__ it works just fine. Looks like the pickle format changes rather dramatically to accommodate it.

    >>> @dataclass(frozen=True, slots=True)
    ... class FrozenData:
    ...     my_string: str
    ...
    >>> deepcopy(FrozenData('initial'))
    FrozenData(my_string='initial')

    Is there a strong motivation to support manually defined __slots__ on top of slots=True that warrants fixing it for 3.10 onward?

    4a93283c-c903-4f64-bca5-96978136d9a8 commented 2 years ago

    I didn't realize it was possible to add slots in that way (slots=True). I think for normal classes manually defining __slots is the common way of adding __slots (at least for me), so it feels weird to me that manually defining __slots__ instead of adding slots=True produces different behaviour. That being said, no strong motivation to change it, this fixes my issue, feel free to close this.

    Thanks, Justin

    On Mon, Oct 18, 2021 at 9:25 PM Josh Rosenberg \report@bugs.python.org\ wrote:

    Josh Rosenberg \shadowranger+python@gmail.com\ added the comment:

    When I define this with the new-in-3.10 slots=True argument to dataclass rather than manually defining __slots__ it works just fine. Looks like the pickle format changes rather dramatically to accommodate it.

    >>> @dataclass(frozen=True, slots=True) ... class FrozenData: ... my_string: str ... >>> deepcopy(FrozenData('initial')) FrozenData(my_string='initial')

    Is there a strong motivation to support manually defined __slots__ on top of slots=True that warrants fixing it for 3.10 onward?

    ---------- nosy: +josh.r


    Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue45520\


    99ffcaa5-b43b-4e8e-a35e-9c890007b9cd commented 2 years ago

    You're right that in non-dataclass scenarios, you'd just use __slots__.

    The slots=True thing was necessary for any case where any of the dataclass's attributes have default values (my_int: int = 0), or are defined with fields (my_list: list = field(defaultfactory=list)). The problem is that \_slots is implemented by, after the class definition ends, creating descriptors on the class to access the data stored at known offsets in the underlying PyObject structure. Those descriptors themselves being class attributes means that when the type definition machinery tries to use __slots to create them, it finds conflicting class attributes (the defaults/fields) that already exist and explodes.

    Adding support for slots=True means it does two things:

    1. It completely defines the class without slots, extracts the stuff it needs to make the dataclass separately, then deletes it from the class definition namespace and makes a *new* class with __slots__ defined (so no conflict occurs)
    2. It checks if the dataclass is also frozen, and applies alternate __getstate/setstate__ methods that are compatible with a frozen, slotted dataclass

    2 is what fixes this bug (while #1 makes it possible to use the full range of dataclass features without sacrificing the ability to use __slots). If you need this to work in 3.9, you could borrow the 3.10 implementations that make this work for frozen dataclasses to explicitly define __getstate/setstate for your frozen slotted dataclasses:

    def __getstate__(self):
        return [getattr(self, f.name) for f in fields(self)]
    
    def __setstate__(self, state):
        for field, value in zip(fields(self), state):
            # use setattr because dataclass may be frozen
            object.__setattr__(self, field.name, value)

    I'm not closing this since backporting just the fix for frozen slotted dataclasses (without backporting the full slots=True functionality that's a new feature) is possibly within scope for a bugfix release of 3.9 (it wouldn't change the behavior of working code, and fixes broken code that might reasonably be expected to work).

    4a93283c-c903-4f64-bca5-96978136d9a8 commented 2 years ago

    Thank you for the in-depth explanation. That all makes sense to me, I have run into the __slots__ with defaults issues before, I'll be sure to try out these fixes. I appreciate you taking the time.

    Thanks, Justin

    On Tue, Oct 19, 2021 at 5:28 PM Josh Rosenberg \report@bugs.python.org\ wrote:

    Josh Rosenberg \shadowranger+python@gmail.com\ added the comment:

    You're right that in non-dataclass scenarios, you'd just use __slots__.

    The slots=True thing was necessary for any case where any of the dataclass's attributes have default values (my_int: int = 0), or are defined with fields (my_list: list = field(defaultfactory=list)). The problem is that \_slots is implemented by, after the class definition ends, creating descriptors on the class to access the data stored at known offsets in the underlying PyObject structure. Those descriptors themselves being class attributes means that when the type definition machinery tries to use __slots to create them, it finds conflicting class attributes (the defaults/fields) that already exist and explodes.

    Adding support for slots=True means it does two things:

    1. It completely defines the class without slots, extracts the stuff it needs to make the dataclass separately, then deletes it from the class definition namespace and makes a *new* class with __slots__ defined (so no conflict occurs)
    2. It checks if the dataclass is also frozen, and applies alternate __getstate/setstate__ methods that are compatible with a frozen, slotted dataclass

    2 is what fixes this bug (while #1 makes it possible to use the full

    range of dataclass features without sacrificing the ability to use __slots). If you need this to work in 3.9, you could borrow the 3.10 implementations that make this work for frozen dataclasses to explicitly define __getstate/setstate for your frozen slotted dataclasses:

    def __getstate__(self): return [getattr(self, f.name) for f in fields(self)]

    def __setstate__(self, state): for field, value in zip(fields(self), state):

    use setattr because dataclass may be frozen

        object.\_\_setattr__(self, field.name, value)

    I'm not closing this since backporting just the fix for frozen slotted dataclasses (without backporting the full slots=True functionality that's a new feature) is possibly within scope for a bugfix release of 3.9 (it wouldn't change the behavior of working code, and fixes broken code that might reasonably be expected to work).

    ----------


    Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue45520\


    picnixz commented 1 month ago

    I think this one can be closed since using slots=True solves the issue. @ericvsmith what do you think?

    ericvsmith commented 1 month ago

    It's probably worth adding a test to make sure it now works.

    picnixz commented 1 month ago

    I'll work on that next week or tomorrow (don't really have time nor the energy for this today :laughing:)