konradhalas / dacite

Simple creation of data classes from dictionaries.
MIT License
1.72k stars 106 forks source link

from_dict resets `dataclasses.field` with argument `init=False` and `default_factory` #244

Open tzah4748 opened 11 months ago

tzah4748 commented 11 months ago

Describe the bug When one of the dataclass's fields is a field(init=False, default_factory=list/dict/whatever) Using from_dict to load an instance of the dataclass results in the field being overridden to the default_factory value effectively ignoring any modifications done in the __post_init__ method of the class.

To Reproduce

from dataclasses import dataclass, field
from dacite import from_dict

@dataclass
class A:
    x: str
    y: int
    z: list = field(init=False, default_factory=list)

    def __post_init__(self):
        self.z = [1,2,3]

data = {
    'x': 'test',
    'y': 1,
}

print(from_dict(data_class=A, data=data))  # Will print: A(x='test', y=1, z=[])
print(A(x='test', y=1))  # Will print: A(x='test', y=1, z=[1, 2, 3])
print(A(**data))  # Will print: A(x='test', y=1, z=[1, 2, 3])
from_dict(data_class=A, data=data) == A(**data)  # False

Expected behavior

from_dict(data_class=A, data=data) == A(**data)  # True

Environment

tzah4748 commented 7 months ago

Update: The problem originates from dacite/dataclasses.py

def create_instance(data_class: Type[T], init_values: Data, post_init_values: Data) -> T:
    instance = data_class(**init_values)
    for key, value in post_init_values.items():
        setattr(instance, key, value)
    return instance

Why would you need the post_init_values and why would you need to set these attributes ?

If a field is init=True it's value will be assigned in instance creation explicitly or by the defined field's default/default_factory If a field is init=False it is expected to:

  1. Have a default/default_factory assigned.
  2. Be assigned to the instance in the __post_init__ method.

In all possible cases, you shouldn't override the value using that for loop on the _post_init_values.items()

For the very least, if this is really needed for some reason (unknown to me), you could easily fix this by adding a call to the instance's __post_init__ method, if defined.

def create_instance(data_class: Type[T], init_values: Data, post_init_values: Data) -> T:
    instance = data_class(**init_values)
    for key, value in post_init_values.items():
        setattr(instance, key, value)
    if hasattr(instance, "__post_init__"):
        instance.__post_init__()
    return instance