konradhalas / dacite

Simple creation of data classes from dictionaries.
MIT License
1.76k stars 106 forks source link

Subsequent calls of from_dict on dataclasses that contain Optional[type] and type | None breaks behaviour because of caching #236

Open TomaszBorczyk opened 1 year ago

TomaszBorczyk commented 1 year ago

Describe the bug Exception is thrown in dacite when using from_dict in certain scenarios on dataclass with <type> | None field, eg str | None

To Reproduce

import dataclasses
import dacite
import typing

@dataclasses.dataclass
class OptionalDataclass:
    text1: typing.Optional[str]

@dataclasses.dataclass
class UnionDataclass:
    text2: str | None

if __name__ == '__main__':
    d1 = {"text1": None}
    # In order to reproduce this bug:
    # 1. this needs to be called first
    dacite.from_dict(OptionalDataclass, d1)

    # 2. this field has to be non None
    d2 = {"text2": "abc"}

    # 3. config with cast needs to be provided
    # (content of cast array does not matter, as it only is necessary to trigger
    #   for cast_type in config.cast
    # in dacite/core.py)
    c = dacite.Config(cast=[int])

    # then below fails
    res = dacite.from_dict(UnionDataclass, d2, config=c)

    # without either of 1-3 from above steps, final from_dict will work as expected

Stack trace:

/usr/local/lib/python3.10/site-packages/dacite/core.py:64: in from_dict
    value = _build_value(type_=field_type, data=field_data, config=config)
/usr/local/lib/python3.10/site-packages/dacite/core.py:101: in _build_value
    if is_subclass(type_, cast_type):
/usr/local/lib/python3.10/site-packages/dacite/types.py:174: in is_subclass
    if is_generic_collection(sub_type):
/usr/local/lib/python3.10/site-packages/dacite/types.py:147: in is_generic_collection
    origin = extract_origin_collection(type_)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

collection = str | None

    def extract_origin_collection(collection: Type) -> Type:
        try:
            return collection.__extra__
        except AttributeError:
>           return collection.__origin__
E           AttributeError: 'types.UnionType' object has no attribute '__origin__'

Root cause is caching of dacite.types.is_generic_collection and dacite.types.is_generic - caching mechanism interprets both Optional[str] and str | None as the same value, so it returns wrong value when we first call it with Optional[str], and then call it with str | None. This means first call of from_dict with OptionalDataclass pollutes the cache and breaks the behaviour for from_dict with UnionDataclass. Note that running only dacite.from_dict(UnionDataclass, d2, config=c) (without preceding call) works as expected.

To prove this, below are even simpler reproduction steps that throw exception:

if __name__ == '__main__':
    is_generic(typing.Optional[str])
    is_generic_collection(str | None)

Below works just fine:

if __name__ == '__main__':
    is_generic_collection(str | None)

thus cache pollutes scope of dacite

Removing @cache from is_generic_collection, is_generic and extract_origin_collection fixes the issue in both cases

Expected behavior Dacite transforms the data without error

Environment

Workaround Downgrade is an option - dacite@1.7.0 works correctly

TomaszBorczyk commented 1 year ago

I've opened the PR to fix the issue.

If I might, I would like to start a conversation about caching - as there seems to have been (and are, and might be more) some issues that appeared because of caching, have you considered reverting those changes completely and dropping caching altogether? Would the performance hit be significant enough that you would rather keep the caching?

Asking as I do not know the historical context (maybe there was a need for performance boost)

@mciszczon @konradhalas