marcosschroh / dataclasses-avroschema

Generate avro schemas from python dataclasses, Pydantic models and Faust Records. Code generation from avro schemas. Serialize/Deserialize python instances with avro schemas.
https://marcosschroh.github.io/dataclasses-avroschema/
MIT License
219 stars 67 forks source link

Serialization bug when using schema inheritance #800

Open cristianmatache opened 2 days ago

cristianmatache commented 2 days ago

Describe the bug https://github.com/marcosschroh/dataclasses-avroschema/blob/ed935bb431ab4e900b5d0f10be672f0a73996e39/dataclasses_avroschema/utils.py#L126-L127

In this code section .__annotations__ is misused. __annotations__ is accessed on the instance, which has the same behavior as accessing it on the class (type(instance)). In this case, __annotations__ only contains the fields of the current class but not those of its parent classes.

To Reproduce

from dataclasses import dataclass

from dataclasses_avroschema import AvroModel

@dataclass
class A(AvroModel):
    a: int

@dataclass
class Parent(AvroModel):
    p: A

@dataclass
class Child(Parent):
    c: int

child = Child(p=A(1), c=1)
child.serialize()  # Errors

Further notes on __annotations__:

assert Child.__annotations__ == child.__annotations__ == {'c': int}
from typing import get_type_hints
assert set(get_type_hints(child)) == {'c'}  # On the instance
assert {'p', 'c'}.issubset(set(get_type_hints(Child)))  # On the class

Expected behavior Serialization should succeed.

Suggestions

from functools import lru_cache @lru_cache def f(model_t, field_name): return is_union(get_type_hints(model_t)[field_name])

%timeit f(type(model), field_name) 160 ns ± 3.13 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

marcosschroh commented 7 hours ago

Thanks for reporting it. I am a bit busy but as soon as I have time I will take a look