jpmorganchase / py-avro-schema

Generate Apache Avro schemas for Python types including standard library data-classes and Pydantic data models.
https://py-avro-schema.readthedocs.io/
Apache License 2.0
37 stars 6 forks source link

list class not allowed in pydantic nested schema default #68

Closed dada-engineer closed 8 months ago

dada-engineer commented 8 months ago

I have a class of type pydantic.BaseModel with a chiled pydantic Basemodel with a list attribute.

This is not serializable right now

import pydantic

class Bar(pydantic.BaseModel):
    baz: list[str] = pydantic.Field(default_factory=list)

class Foo(pydantic.BaseModel):
    bar: Bar = pydantic.Field(default_factory=Bar)

print(pas.generate(Foo))

Traceback:

import py_avro_schema as pas

import pydantic

class Bar(pydantic.BaseModel):
    baz: list[str] = pydantic.Field(default_factory=list)

class Foo(pydantic.BaseModel):
    bar: Bar = pydantic.Field(default_factory=Bar)

print(pas.generate(Foo))

Traceback (most recent call last):
  File "/Users/dada_engineer/workspace/private/py-avro-schema/example.py", line 14, in <module>
    print(pas.generate(Foo))
  File "/Users/dada_engineer/workspace/private/py-avro-schema/.venv/lib/python3.9/site-packages/memoization/caching/plain_cache.py", line 42, in wrapper
    result = user_function(*args, **kwargs)
  File "/Users/dada_engineer/workspace/private/py-avro-schema/src/py_avro_schema/__init__.py", line 64, in generate
    schema_dict = schema(py_type, namespace=namespace, options=options)
  File "/Users/dada_engineer/workspace/private/py-avro-schema/src/py_avro_schema/_schemas.py", line 139, in schema
    schema_data = schema_obj.data(names=names)
  File "/Users/dada_engineer/workspace/private/py-avro-schema/src/py_avro_schema/_schemas.py", line 695, in data
    return self.data_before_deduplication(names=names)
  File "/Users/dada_engineer/workspace/private/py-avro-schema/src/py_avro_schema/_schemas.py", line 766, in data_before_deduplication
    "fields": [field.data(names=names) for field in self.record_fields],
  File "/Users/dada_engineer/workspace/private/py-avro-schema/src/py_avro_schema/_schemas.py", line 766, in <listcomp>
    "fields": [field.data(names=names) for field in self.record_fields],
  File "/Users/dada_engineer/workspace/private/py-avro-schema/src/py_avro_schema/_schemas.py", line 826, in data
    field_data["default"] = self.schema.make_default(self.default)
  File "/Users/dada_engineer/workspace/private/py-avro-schema/src/py_avro_schema/_schemas.py", line 908, in make_default
    return {key: _schema_obj(value.__class__).make_default(value) for key, value in py_default}
  File "/Users/dada_engineer/workspace/private/py-avro-schema/src/py_avro_schema/_schemas.py", line 908, in <dictcomp>
    return {key: _schema_obj(value.__class__).make_default(value) for key, value in py_default}
  File "/Users/dada_engineer/workspace/private/py-avro-schema/src/py_avro_schema/_schemas.py", line 162, in _schema_obj
    raise TypeNotSupportedError(f"Cannot generate Avro schema for Python type {py_type}")
py_avro_schema._schemas.TypeNotSupportedError: Cannot generate Avro schema for Python type <class 'list'>

relates to handling of pydantic defaults in #64

dada-engineer commented 8 months ago

@faph this is a problem when having an empty list in defaults. We could not instantiate this sequence schema even if we can detect the type correctly as the constructor requires at least one element in the list.

I'd suggest for now to use schema_obj(value.class) only if the value is supported by any schema.

EDIT: otherwise we'd use the value as is.

dada-engineer commented 8 months ago

Sorry correction, we should just use the pydantic schema annotations instead of value.class. Then it works

dada-engineer commented 8 months ago

@faph would be great to have this fix quite soon 😊 sorry for the hurry / pushy behaviour 🙈

faph commented 8 months ago

That's ok @dada-engineer. Left a few minor comments