konradhalas / dacite

Simple creation of data classes from dictionaries.
MIT License
1.76k stars 106 forks source link

Exception thrown by custom code inside a Union are caught by mistake #218

Open jsangmeister opened 1 year ago

jsangmeister commented 1 year ago

Describe the bug When an exception is raised i.e. in the __post_init__ method of a nested dataclass inside a union, the from_dict method will swallow that error and code execution will resume, although the user code threw an exception and should therefore stop.

To Reproduce The minimum example I could come up with:

@dataclass
class A:
    a: str

    def __post_init__(self):
        raise Exception("test error")

@dataclass
class B:
    union: Union[A, str]

from_dict(
    B,
    {"union": {"a": "a"}},
    Config(check_types=False),
)

Result: no exception is thrown and code execution resumes with the parsed object.

Expected behavior The exception should not be caught by dacite and instead be passed up to the caller.

Environment

Additional context It seems like the problem was introcude with version 1.8.0, in 1.7.0 this works without problems. The line swallowing the error is the catch-all here: https://github.com/konradhalas/dacite/blob/c076b8c3b3de3631f28db01f3ae41e4d69aa6c50/dacite/core.py#L120 but that was not changed in long time...

Also, without check_types=False the error does not happen, but I did not take the time to figure out why.

sunmy2019 commented 1 year ago

This change is caused by the fix of #94.

In previous version, it guess which one should be matched and then construct once. In currect version, it construct several times until no exception is thrown.

__post_init__ is a feature of dataclass, specifically, __post_init__ was added into the __init__ of the decorated class.

If __post_init__ raises something, it just means the __init__ of this decorated dataclass is raising something (you cannot really tell from outside).

Thus, I think It shouldn't be a match, NOT A BUG.

jsangmeister commented 1 year ago

But the classes match. Do you have a suggestion on how it should be done instead? We use the __post_init__ method for automatic validation in our dataclass, so the error is important and must be thrown. Generally, I think a catch-all is not a good idea, especially on potentially important or user-thrown errors. Wouldnvt it be better to correctly define a list of error types which should be caught if an object is wrongly initalized?

sunmy2019 commented 1 year ago

I understand your concern, but I don't think the old "guess and construct" logic is reasonable.

Consider the following case,

@dataclass
class A:
    a: str

    def __post_init__(self):
        raise Exception("test error")

class C:
    def __init__(self, a: str):
        ...     # <-- may raise anything

@dataclass
class B:
    union: Union[A, C]

@dataclass
class D:
    union: Union[C, A]

from_dict(
    B,
    {"union": {"a": "a"}},
    Config(check_types=True),
)

Shouldn't we give the latter class a try, if the first class raises an exception?

Also, C.__init__(**{"a": "a"}) may raise anything here, so if we try matching C, we must except anything C.__init__ may raise, which is a broad expection.

sunmy2019 commented 1 year ago

Technically, we do have a way to hack dataclass with __post_init__. But I don't know whether it should be implemented.

The most elegant way is to check whether this Exception is raised from a dataclass and its __post_init__. By modifying

https://github.com/konradhalas/dacite/blob/c076b8c3b3de3631f28db01f3ae41e4d69aa6c50/dacite/core.py#L118-L121

to

if cache(is_dataclass)(inner_type):  # and inner_type in ...

    class inner_type_escape_post_init(inner_type):
        def __post_init__(self, *args, **kwargs):
            pass

else:
    inner_type_escape_post_init = inner_type

# noinspection PyBroadException
try:
    value = _build_value(type_=inner_type_escape_post_init, data=data, config=config)
except Exception:  # pylint: disable=broad-except
    continue
if inner_type_escape_post_init != inner_type:
    value = _build_value(type_=inner_type, data=data, config=config)
jsangmeister commented 1 year ago

Thank you for the extended answer, I understand the problem now. The hack seems indeed very hacky and I'm not sure about implementing this either, but it would certainly help. Another help would be to attach all thrown errors to the thrown UnionMatchError if check_types is False, something like:

union_match_errors = []
for inner_type in types:
    try:
        # noinspection PyBroadException
        try:
            value = _build_value(type_=inner_type, data=data, config=config)
        except Exception as e:  # pylint: disable=broad-except
            union_match_errors.append(e)
            continue
        pass
# ...
raise UnionMatchError(field_type=union, value=data, exceptions=union_match_errors)

This way, one would still have access to the errors and may be able to decide better what went wrong than currently with the UnionMatchError.

Alternatively, one could also use the traceback module to check if the error was thrown in a __post_init__ method:

import sys, traceback

# ...
except Exception:  # pylint: disable=broad-except
    tb = sys.exc_info()[-1]
    if traceback.extract_tb(tb)[-1].name == "__post_init__":
        raise
    else:
        continue

Maybe this is a bit less hacky... What do you think?

sunmy2019 commented 1 year ago
import sys, traceback

# ...
except Exception:  # pylint: disable=broad-except
    tb = sys.exc_info()[-1]
    if traceback.extract_tb(tb)[-1].name == "__post_init__":
        raise
    else:
        continue

I had thought about this way first, but it cannot handle cases when calling function from __post_init__

def __post_init__(self,...):
    calling_some_function()