marcosschroh / dataclasses-avroschema

Generate avro schemas from python classes. Code generation from avro schemas. Serialize/Deserialize python instances with avro schemas
https://marcosschroh.github.io/dataclasses-avroschema/
MIT License
213 stars 64 forks source link

Incorrect Serialization and Deserialization of Union Types #584

Open othmane099 opened 5 months ago

othmane099 commented 5 months ago

I encountered an issue while using the dataclasses-avroschema package for Avro serialization in Python. When attempting to serialize and deserialize a dataclass with a union type using dataclasses_avroschema, the deserialized object doesn't match the expected type.

from dataclasses_avroschema import AvroModel
from dataclasses import dataclass
import typing

@dataclass
class MessageTypeTwo(AvroModel):
    val: typing.Union[None, str]
    class Meta:
        namespace = "Messages.type.two"

@dataclass
class MessageTypeOne(AvroModel):
    class Meta:
        namespace = "Messages.type.one"

@dataclass
class CoreMessage(AvroModel):
    messageBody: typing.Union[
        MessageTypeOne,
        MessageTypeTwo,
    ]

Serialize and deserialize an instance of CoreMessage with an instance of MessageTypeTwo:

mt2 = MessageTypeTwo(val="val")
core_message = CoreMessage(messageBody=mt2)
serialized = core_message.serialize()
deserialized = CoreMessage.deserialize(serialized)
print(deserialized.messageBody)

Expected Result: The print statement should output MessageTypeTwo(val='val').

Actual Result: The print statement outputs MessageTypeOne().

marcosschroh commented 5 months ago

@othmane099 This is the expected behavior. You have to set a different dacite config

othmane099 commented 5 months ago

By adding the following dacite_config settings to CoreMessage:

@dataclass
class CoreMessage(AvroModel):
    messageBody: typing.Union[
        MessageTypeOne,
        MessageTypeTwo,
    ]

    class Meta:
        dacite_config = {
            "strict_unions_match": True,
            "strict": True,
        }

The previous example (MessageTypeTwo) works as expected. However, in the case of MessageTypeOne:

mt1 = MessageTypeOne()
core_message = CoreMessage(messageBody=mt1)
serialized = core_message.serialize()
deserialized = CoreMessage.deserialize(serialized)
print(deserialized.messageBody)

an exception dacite.exceptions.StrictUnionMatchError: can not choose between possible Union matches for field "messageBody": MessageTypeOne, MessageTypeTwo

Could the issue be attributed to the absence of fields within the MessageTypeOne class?

marcosschroh commented 5 months ago

Not really. In your case if you use the following config it should work

class Meta:
    dacite_config = {
        "strict": True,
    }

In any case, sometimes it is not possible to cover all cases so you will have to play with differentt dacite config

othmane099 commented 5 months ago

Thanks for your answer. I encountered another case where the types have same attribute name, for example:

from dataclasses_avroschema import AvroModel
from dataclasses import dataclass
import typing

@dataclass
class MessageTypeTwo(AvroModel):
    val: str

    class Meta:
        namespace = "Messages.type.two"

@dataclass
class MessageTypeOne(AvroModel):
    val: str

    class Meta:
        namespace = "Messages.type.one"

@dataclass
class CoreMessage(AvroModel):
    messageBody: typing.Union[
        MessageTypeOne,
        MessageTypeTwo,
    ]

    class Meta:
        dacite_config = {
            "strict": True,
        }

mt2 = MessageTypeTwo("Hello World")
core_message = CoreMessage(messageBody=mt2)
serialized = core_message.serialize()
deserialized = CoreMessage.deserialize(serialized)
print(deserialized.messageBody)

Expected: MessageTypeTwo(val='Hello World') Actual: MessageTypeOne(val='Hello World')

marcosschroh commented 5 months ago

Yes, it makes sense. It is impossible to determine which class should be created. Under the hood you get a json {"messageBody": {"val": "Hello World"}} and it is impossible to know which class to use. You must define a different strategy or include extra data to distinguish among the objects.