Closed dada-engineer closed 8 months ago
Found a solution without importing pydantic at runtime, as long as you are fine with # type: ignore
. Which I figured is okay as there have already been some.
An alternative would be to allow setting orjson.dumps
s default. This will make the json serialisation less performant though, according to orjson's docs (No problem for our usecase but might be for others)
Personally I'd prefer to handle this within PydanticSchema
class but I'd like to have your opinion on this.
Downside: This can not handle a dataclass that has an attribute that is a pydantic class with a default. Not sure how common this usecase is or I should add handling this to DataclassSchema
as well. 😅
ack
@faph any news on this?
Hi @dada-engineer
I am just trying to get my head around this and what the expected behavior should be. Reading the spec here: https://avro.apache.org/docs/1.11.1/specification/#schema-record makes me think that we should be following the Avro spec exactly on how field defaults should be serialized. There is only 1 correct serialization for a given type, so that would suggest we should NOT make the serialization configurable.
Instead, we might need to consider using the make_default
methods consistently and recursively when generating defaults that a record type?
I have to say, I have not thought about using an entire Pydantic object as a default. I guess the Avro spec supports it, so py-avro-schema should support it too.
So the problem here is that PydanticSchema
itself does not implement make_default
. It probably should and should do this recursively by use each field's schema's make_default
.
Thanks a lot for your feedback. This makes sense yes.
I am not sure if I understand how you want this to be implemented though.
Let me push a concept PR that would demonstrate this.
I think I have some idea. i'll close the merge request regarding orjson default, and update the other one. Maybe you can tell me if there are other more optimal fields to use 👍🏻
It's not unusual for complex record fields like in your case to be "nullable" instead of defaulting to a complex/record object. It's up to you whether such a default should be present in the Avro schema itself.
For the Python class, it's not unusual for the class/function signature to default to None
, but then in the body of __init__()
such a field to be set with a given object. Certainly, for plain Python classes that do not support mutable defaults, you would use this approach.
You want to tell me we should rethink our defaults? 🤔
No, no, if that's the right design for you that's fine. Avro and Pydantic support it. Just saying that plain Python classes don't.
So, this PR: #67 actually passes your test. But it would not be the correct solution.
Although arguably better than the current implementation. So we can think about whether we should merge this in the meantime like this...
Updated the merge request. Feel free to have a look.
That looks good, seems simpler than I expected, which is good!
Released as 3.5.0.
Many thanks!
For anyone interested, I managed to backport this to v2 (which I'm stuck on) with this diff on src/py_avro_schema/_schemas.py
:
@@ -522,6 +522,12 @@
"items": self.items_schema.data(names=names),
}
+ def make_default(self, py_default: collections.abc.Sequence) -> JSONArray:
+ """Return an Avro schema compliant default value for a given Python Sequence
+ :param py_default: The Python sequence to generate a default value for.
+ """
+ return [self.items_schema.make_default(item) for item in py_default]
+
class DictSchema(Schema):
"""An Avro map schema for a given Python mapping"""
@@ -841,6 +847,22 @@
)
return field_obj
+ def make_default(self, py_default: pydantic.BaseModel) -> JSONObj:
+ """Return an Avro schema compliant default value for a given Python value"""
+ return {key: _schema_obj(self._annotation(key)).make_default(value) for key, value in py_default}
+
+ def _annotation(self, field_name: str) -> Type:
+ """
+ Fetch the raw annotation for a given field name
+
+ Pydantic "unpacks" annotated and forward ref types in their FieldInfo API. We need to access to full, raw
+ annotated type hints instead.
+ """
+ for class_ in self.py_type.mro():
+ if class_.__annotations__.get(field_name):
+ return class_.__annotations__[field_name]
+ raise ValueError(f"{field_name} is not a field of {self.py_type}") # Should never happen
+
@staticmethod
def _py_type(py_field: pydantic.fields.ModelField) -> Any:
"""Return a Python type annotation for a given Pydantic field"""
Hi everyone,
I have a pydantic model that has an attribute of another pydantic model type that should be generated by default. This fails as the schema dict produced by py_avro_schema sets as default the pydantic class object, which is per default not json serializable via orjson.
example.py
This raises the following error:
Suggestion
On a PydanticSchema for each recordfield check if the
py_type
is a pydantic basemodel subclass and if so calldefault.model_dump(mode="json")
on itThis would then produce this schema:
which looks alright.
The downside is that pydantic would be needed at runtime so it must be either catched as an error or it would be imported on class / method level.
Let me know what you think about this please. Thanks a lot.