Open anneum opened 4 months ago
Hi @anneum
Yes, we should definitely generate proper class attributes names in snakecase
. However, we should not uses aliases unless the original schema has it. The reason is that we should keep the same schema, if we change the field name and we add an aliases
then the schemas (original and generated from model) will be different. Did you try to use the case funcionality?
Following your example, you could do the following (if the field is generated properly):
from dataclasses_avroschema.pydantic import AvroBaseModel
from dataclasses_avroschema import case
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str
print(SchemaAvroBenchmark.avro_schema(case_type=case.SPINALCASE))
{"type": "record", "name": "SchemaAvroBenchmark", "fields": [{"name": "scope-case", "type": "string"}]}
@anneum if you update to the latest version (0.59.1
) now it will generate valid python variables.
That's great, thanks for that. I see a little problem when we specify the case. It also overwrites field names that do not match the case.
Original Schema:
{
"type": "record",
"name": "SchemaAvroBenchmark",
"fields": [
{
"name": "scope-case",
"type": "string"
},
{
"name": "scope_case2",
"type": "string"
}
]
}
After the model_generator.render(schema=schema, model_type=ModelType.AVRODANTIC.value)
:
from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str
scope_case2: str
print(SchemaAvroBenchmark.avro_schema(case_type=case.SPINALCASE))
After the conversion into a avro schema:
{
"type": "record",
"name": "SchemaAvroBenchmark",
"fields": [
{
"name": "scope-case",
"type": "string"
},
{
"name": "scope-case-2",
"type": "string"
}
]
}
Therefore, I suggest that we use the pydantic field with the serialization_alias
parameter and instead of converting it to an alias as part of the schema, use it to store the original name.
The avro_schema()
method should then use the serialization_alias
(if available) as the field name instead of the attribute name when creating the schema.
from pydantic import Field
from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str = Field(..., serialization_alias='scope-case')
scope_case2: str
print(SchemaAvroBenchmark.avro_schema())
As an addition, I think the serialization_alias
parameter is meant for just such cases. https://docs.pydantic.dev/latest/concepts/fields/#field-aliases
Ok, we can add the alias
to the generated model. We need to take into account that if the field has aliases
already defined then it will be a bit weird and serialization_alias
is a string
, not list
. Example:
{
"type": "record",
"name": "SchemaAvroBenchmark",
"fields": [
{
"aliases": [
"my-scope-case",
"renamed-scope-case"
],
"name": "scope-case",
"type": "string"
}
]
}
Maybe we should use the metadata
to add the aliases instead of serialization_alias
? In this case the generated model should be:
from pydantic import Field
from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str = Field(metadata={"aliases": ["scope-case", ...])
Then when generating the schema from the model, we will have the extra alias "scope-case". Does it work for you? If it does, then it is quite easy to implement.
The metadata field sounds like a good option, but I would use it as a string instead of a list. This is because there can only be one alias that could not be used as a field name because of its name. So if the field name in the schema is not valid (according to python rules), we add the original field name as an alias in the metadata and change the field name for the variable label in the pydantic model to a valid name.
The reason I would have liked to use the serialization_alias
is the ability built into pydantic to dump the object with .model_dump(by_alias=True)
. This allows me to bring objects directly into the shape needed for the schema. This kills two birds with one stone.
from pydantic import Field
from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str = Field(..., serialization_alias='scope-case')
scope_case2: str
SchemaAvroBenchmark(scope_case='foo', scope_case2='bar').model_dump(by_alias=True)
# {'scope-case': 'foo', 'scope_case2': 'bar'}
Question: are you using mudel_dump(by_alias=True)
to generate the payload, then encode it and send it to kafka
? I am asking because the way to do it is using serialize
.
from pydantic import Field
from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str = Field(serialization_alias='scope-case')
# serialize to avro-json to send to kafka just to see the fields (the same will happen with avro-binary)
benchmark = SchemaAvroBenchmark.fake()
print(benchmark, "\n\n")
>>> scope_case='FbeDbPMeawuTwxUbhSaY'
# This can be send to kafka (it is bytes)
ser = benchmark.serialize(serialization_type="avro-json")
# It will produce an event with the field `scope_case` and not with the alias.
print(set)
>>> b'{"scope_case": "FbeDbPMeawuTwxUbhSaY"}'
# This is with alias, but they are not bytes. Do not send to kafka.
print(benchmark.model_dump(by_alias=True))
>>> {"scope-case'": 'FbeDbPMeawuTwxUbhSaY'}
I will work in a PR to add the alias
, working as:
serialization_alias
aliases
, then the python class must contain serialization_alias
and metadata={"aliases": [ ...]}
Is your feature request related to a problem? Please describe. When I convert an Avro schema whose fields do not match the rules for Python variables to a Pydantic (or Avrodantic) model, I get a non-valid model.
For example, if I convert the following schema, I get an invalid model:
Invalid value due to
-
in the variable nameIn my opinion, the
serialization_alias
fromFields
should be used for this.When converting the valid model to an Avro schema, the
serialization_alias
should be used as the field name.To summarize: Converting from a schema to a model and then back to a schema should result in the same schema.
I am already working on the implementation and getting the right result for such a simple schema (except for importing fields), but would appreciate support. In particular, I am having a lot of trouble with union cases.