python / cpython

The Python programming language
https://www.python.org
Other
62.44k stars 29.97k forks source link

asdict/astuple Dataclass methods #80843

Open e183a343-e457-4baf-97bd-5e6a2e096b0d opened 5 years ago

e183a343-e457-4baf-97bd-5e6a2e096b0d commented 5 years ago
BPO 36662
Nosy @rhettinger, @ericvsmith, @matrixise, @tirkarthi

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/ericvsmith' closed_at = None created_at = labels = ['3.8', 'type-feature', 'library'] title = 'asdict/astuple Dataclass methods' updated_at = user = 'https://bugs.python.org/gsakkis' ``` bugs.python.org fields: ```python activity = actor = 'gsakkis' assignee = 'eric.smith' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'gsakkis' dependencies = [] files = [] hgrepos = [] issue_num = 36662 keywords = [] message_count = 4.0 messages = ['340511', '340523', '340532', '340537'] nosy_count = 5.0 nosy_names = ['rhettinger', 'gsakkis', 'eric.smith', 'matrixise', 'xtreak'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue36662' versions = ['Python 3.8'] ```

e183a343-e457-4baf-97bd-5e6a2e096b0d commented 5 years ago

I'd like to propose two new optional boolean parameters to the @dataclass() decorator, asdict and astuple, that if true, the respective methods are generated as equivalent to the module-level namesake functions.

In addition to saving an extra imported name, the main benefit is performance. By having access to the specific fields of the decorated class, it should be possible to generate a more efficient implementation than the one in the respective function. To illustrate the difference in performance, the asdict method is 28 times faster than the function in the following PEP-557 example:

    @dataclass
    class InventoryItem:
        '''Class for keeping track of an item in inventory.'''
        name: str
        unit_price: float
        quantity_on_hand: int = 0

        def asdict(self): 
            return {
                'name': self.name, 
                'unit_price': self.unit_price, 
                'quantity_on_hand': self.quantity_on_hand,
            } 
In [4]: i = InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)                           

In [5]: asdict(i) == i.asdict()                                                                         
Out[5]: True

In [6]: %timeit asdict(i)                                                                               
5.45 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit i.asdict()                                                                              
193 ns ± 0.443 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Thoughts?

tirkarthi commented 5 years ago

asdict method in the benchmark does a direct dictionary construction. Meanwhile dataclasses.asdict does more work in https://github.com/python/cpython/blob/e8113f51a8bdf33188ee30a1c038a298329e7bfa/Lib/dataclasses.py#L1023 . Hence in the example i.asdict() and asdict(i) are not equivalent.

import timeit
from dataclasses import dataclass, asdict

@dataclass
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def asdict(self):
        data = {'name': self.name,
                'unit_price': self.unit_price,
                'quantity_on_hand': self.quantity_on_hand,
        }
        return data

i = InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)
setup = """from dataclasses import dataclass, asdict;
@dataclass
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def asdict(self):
        data = {'name': self.name,
                'unit_price': self.unit_price,
                'quantity_on_hand': self.quantity_on_hand,
        }
        return data

i = InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)"""

print("asdict(i)") print(timeit.Timer("asdict(i)", setup=f"{setup}").timeit(number=1_000_000)) print("i.asdict()") print(timeit.Timer("i.asdict()", setup=f"{setup}").timeit(number=1_000_000)) print("i.inlined_asdict()") print(timeit.Timer("i.inlined_asdict(i)", setup=f"{setup}; i.inlined_asdict = asdict").timeit(number=1_000_000))

i.inlined_asdict = asdict
assert asdict(i) == i.asdict() == i.inlined_asdict(i)

./python.exe ../backups/bpo36662.py asdict(i) 11.585838756000001 i.asdict() 0.44129350699999925 i.inlined_asdict() 11.858042807999999

ericvsmith commented 5 years ago

I think the best thing to do is write another decorator that adds this method. I've often thought that having a dataclasses_tools third-party module would be a good idea. It could include my add_slots decorator in https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py

Such a decorator could then deal with all the complications that I don't want to add to @dataclass. For example, choosing a method name. @dataclass doesn't inject any non-dunder names in the class, but the new decorator could, or it could provide a way to customize the member name.

Also, note that your example asdict method doesn't do the same thing as dataclasses.asdict. While you get some speedup by knowing the field names in advance, you also don't do the recursive generation that dataclasses.asdict does. In order to skip the recursive dict generation, you'd either have to test the type of each member (using some heuristic about what doesn't need recursion), or assume the member type matches the type defined in the class. I don't want dataclasses.asdict to make the assumption that the member type matches the declared type. There's nowhere else it does this.

I'm not sure how much of the speedup you're seeing is the result of hard-coding the member names, and how much is avoiding recursion. If all of the improvement is by eliminating recursion, then it's not worth doing.

I'm not saying the existing dataclasses.asdict can't be sped up: surely it can. But I don't want to remove features or add complexity to do so.

e183a343-e457-4baf-97bd-5e6a2e096b0d commented 5 years ago

I think the best thing to do is write another decorator that adds this method. I've often thought that having a dataclasses_tools third-party module would be a good idea.

I'd be happy with a separate decorator in the standard library for adding these methods. Not so sure about a third-party module, the added value is probably not high enough to justify an extra dependency (assuming one is aware it exists in the first place).

or assume the member type matches the type defined in the class.

This doesn't seem an unreasonable assumption to me. If I'm using a dataclass, I probably care enough about its member types to bother declaring them and I wouldn't mind if a particular method expects that the members actually match the types. This behaviour would be clearly documented.

Alternatively, if we go with a separate decorator, whether this assumption holds could be a parameter, something like:

    def add_asdict(cls, name='asdict', strict=True)
roysmith commented 1 year ago

Interesting discussion. I got here because I was looking for a clean way to have a Flask view return a dataclass object as json. Flask gives you dict-to-json conversion for free, but you need to do the dataclass-to-dict yourself. I could certainly have my view return dataclasses.asdict(my_object), but that exposes to the view that my_object is a dataclass. The application would be less tightly coupled if the view didn't need to know that.

Anyway, I see that https://pypi.org/project/dataclasses-json/ exists now, for people who want to go down the third-party decorator road.