jpmorganchase / py-avro-schema

Generate Apache Avro schemas for Python types including standard library data-classes and Pydantic data models.
https://py-avro-schema.readthedocs.io/
Apache License 2.0
37 stars 6 forks source link

Should we support un-annotated decimal types? #63

Closed faph closed 1 year ago

faph commented 1 year ago

Currently, we require decimal.Decimal types to be annotated with a py_avro_schema.DecimalMeta object defining "precision" and optionally "scale".

Should we support plain decimal.Decimal types and default the precision parameter to be something sensible?

If so, is there any precedent for a default precision value?

Since we are generating Avro bytes schemas for decimals and not fixed schema, does it actually matter whether we default precision to something huge? The size of the actual serialized number would simply depend on the actual digits being used, not the schema's maximum precision.

The reason for the above question would be to align it with how we treat say dates and times. Here we default to the maximum precision Avro supports: nanoseconds.

faph commented 1 year ago

@dada-engineer Wouldn't mind your opinion on this.

dada-engineer commented 1 year ago

What about setting decimal.get_context().prec as default precision? It's the only sensical thing I can think of although it might still be too weak.

Edit: would love a default behaviour. Maybe 38 is also a common sense precision as it is the max precision spark allows and avro and spark are often used together.

faph commented 1 year ago

Thanks for the suggestion!

So reading through the docs, while Python itself uses a default precision of 28, this makes sense only because Python does not use a fixed scale. Avro does. And Avro's default scale is zero. That effectively means that unannotated decimals would end up being rounded to the nearest integer (if the value serializes at all).

Scale zero is probably rarely the correct value in any case. So not annotating the decimal at all might make people forget that Avro decimals are not like Python decimals...