This change simplifies the interactions between Avromatic models and lower level Avro serialization/deserialization resulting in:
Simplified control flow making the code easier to understand
An Avro serialization/deserialization layer that will be easier to swap out for a native implementation
Union serialization/deserialization optimizations on non-record union member types that previously only applied to record union member types
Improved serialization performance for deeply nested schemas - benchmarks show 35% less memory usage and 1.6x throughput increase
Extensions to Avro::IO::DatumReader that duplicate less code from the base classes resulting in less long term maintenance
Now some more details on the changes...
Serialization
Previously nested models were serialized by Avromatic as a Hash with a pointer to the Avromatic model instance (Avromatic::IO::ENCODING_PROVIDER) and a union member index (Avromatic::IO::UNION_MEMBER_INDEX) if that model appeared in a union field. This enabled the Avromatic::IO::DatumWriter to avoid recomputing the union member index since it was already known by Avromatic. Unfortunately the Avromatic::IO::DatumWriter callback to the model's avro_raw_value method to recursively serialize nested models resulted in extra StringIO and Hash allocations and calls to Avro::SchemaValidator.validate! which adds lots of overhead for highly nested schemas.
The serialization change is twofold:
Recursively convert the Avromatic model into an attributes hash in Avromatic::Model::Types::RecordType#serialize before calling the Avromatic::IO::DatumWriter.
Wrap union datums with Avromatic::IO::UnionDatum that includes the member index in Avromatic::Model::Types::UnionType#serialize to avoid recomputing the member index in Avromatic::IO::DatumWriter.
Deserialization
Previously the Avromatic::IO::DatumReader included a union member's index when deserializing records in a union so Avromatic could optimize the creation of the corresponding Avromatic model instance. With this change the Avromatic::IO::DatumReader now wraps the union value in a Avromatic::IO::UnionDatum that can be leveraged by Avromatic::Model::Types::UnionType#serialize. This also makes it easier for Avromatic::IO::DatumReader#read_data to delegate to the super class for everything except the union case we need to handle specially.
This change simplifies the interactions between Avromatic models and lower level Avro serialization/deserialization resulting in:
Avro::IO::DatumReader
that duplicate less code from the base classes resulting in less long term maintenanceNow some more details on the changes...
Serialization
Previously nested models were serialized by Avromatic as a
Hash
with a pointer to the Avromatic model instance (Avromatic::IO::ENCODING_PROVIDER
) and a union member index (Avromatic::IO::UNION_MEMBER_INDEX
) if that model appeared in a union field. This enabled theAvromatic::IO::DatumWriter
to avoid recomputing the union member index since it was already known by Avromatic. Unfortunately theAvromatic::IO::DatumWriter
callback to the model'savro_raw_value
method to recursively serialize nested models resulted in extraStringIO
andHash
allocations and calls toAvro::SchemaValidator.validate!
which adds lots of overhead for highly nested schemas.The serialization change is twofold:
Avromatic::Model::Types::RecordType#serialize
before calling theAvromatic::IO::DatumWriter
.Avromatic::IO::UnionDatum
that includes the member index inAvromatic::Model::Types::UnionType#serialize
to avoid recomputing the member index inAvromatic::IO::DatumWriter
.Deserialization
Previously the
Avromatic::IO::DatumReader
included a union member's index when deserializing records in a union so Avromatic could optimize the creation of the corresponding Avromatic model instance. With this change theAvromatic::IO::DatumReader
now wraps the union value in aAvromatic::IO::UnionDatum
that can be leveraged byAvromatic::Model::Types::UnionType#serialize
. This also makes it easier forAvromatic::IO::DatumReader#read_data
to delegate to the super class for everything except the union case we need to handle specially.@will89 - you're prime /cc @tjwp @askreet