marcosschroh / dataclasses-avroschema

Generate avro schemas from python dataclasses, Pydantic models and Faust Records. Code generation from avro schemas. Serialize/Deserialize python instances with avro schemas.
https://marcosschroh.github.io/dataclasses-avroschema/
MIT License
219 stars 67 forks source link

Field and type with same name not parsed correctly by Pydantic #769

Closed Masqueey closed 1 month ago

Masqueey commented 1 month ago

Describe the bug A schema with a field name that is identical to a class name will fail to be parsed by Pydantic (and also create the wrong fake()data) when it is within a type definition (e.g. as typing.List[ExampleType]).

To Reproduce

  1. Have a datamodel that uses the AvroBaseModel from dataclasses_avroschema.pydantic.
  2. Have a field in the model that has the same name as a class, but within a type-specified field.

As an example:

class MessageHeader(AvroBaseModel):
    version: str
    MessageType: str

class MessageHeader(AvroBaseModel):
    MessageHeader: typing.Optional[typing.List[MessageHeader]]
    MessageBody: str
  1. Try for instance .fake() on the model. It will generate either None or [None]

To Bypass

  1. Rename the class name, for instance add _
  2. Add metadata to the class to keep its old name when a schema is generated using this model.

To bypass the error in the above example:

class _MessageHeader(AvroBaseModel):
    version: str
    MessageType: str

    class Meta:
        schema_name = "MessageHeader"

class MessageHeader(AvroBaseModel):
    MessageHeader: typing.Optional[typing.List[_MessageHeader]]
    MessageBody: str

Expected behavior Have the model be usable in the same way when not using any typing specifications like optional or list without changing the name or having to add metadata to the model.

marcosschroh commented 1 month ago

Both classes are called in the same way, should it not be something like

class MessageHeader(AvroBaseModel):
    version: str
    MessageType: str

class Message(AvroBaseModel):
    MessageHeader: typing.Optional[typing.List[MessageHeader]]
    MessageBody: str
marcosschroh commented 1 month ago

When an avro schema has fields that are clashing with others, for example a field name is the same as a record name the model generated might have problems as the one described in the issue

There are 3 ways to solve this problem:

  1. Rename the class MessageHeader, add a class Meta and change the typing where the type is used. From the algorithmically point of view is complicated as we need to change the class name, the typing and add the class Meta, plus it is more code to maintain.
  2. Change the field name MessageHeader and add an alias, for example: message_header: typing.Optional[typing.List[MessageHeader]] = field(metadata={aliases=["MessageHeader"]}) . This is simpler but It will not reflect the original schema
  3. Just before the name clashing we could add another definition so the the type won't be defined any more in the class scope, example: _MessageHeader = MessageHeader. This solution is simpler, it implies less code and it seems more pythonic. This solution seems to be the preferred one for pydantic users as well.

I will process with solution number 3. The output model will be:

from dataclasses_avroschema.pydantic import AvroBaseModel
import pydantic
import typing

class MessageHeader(AvroBaseModel):
    version: str
    MessageType: str

_MessageHeader = MessageHeader

class Message(AvroBaseModel):
    MessageBody: str
    MessageHeader: typing.Optional[typing.List[_MessageHeader]] = None

print(Message.fake(), "\n")

>>> MessageBody='rrzzTDuvXOovfphUmjrY' MessageHeader=[MessageHeader(version='lMVdWZICUIkEFNbyUrck', MessageType='ciaNGFZdHPDwVEEdCpdN')]