salsify / avromatic

Generate Ruby models from Avro schemas
MIT License
89 stars 17 forks source link

Simplify/optimize nested model encoding/decoding #129

Closed jturkel closed 3 years ago

jturkel commented 3 years ago

This change simplifies the interactions between Avromatic models and lower level Avro serialization/deserialization resulting in:

Now some more details on the changes...

Serialization

Previously nested models were serialized by Avromatic as a Hash with a pointer to the Avromatic model instance (Avromatic::IO::ENCODING_PROVIDER) and a union member index (Avromatic::IO::UNION_MEMBER_INDEX) if that model appeared in a union field. This enabled the Avromatic::IO::DatumWriter to avoid recomputing the union member index since it was already known by Avromatic. Unfortunately the Avromatic::IO::DatumWriter callback to the model's avro_raw_value method to recursively serialize nested models resulted in extra StringIO and Hash allocations and calls to Avro::SchemaValidator.validate! which adds lots of overhead for highly nested schemas.

The serialization change is twofold:

  1. Recursively convert the Avromatic model into an attributes hash in Avromatic::Model::Types::RecordType#serialize before calling the Avromatic::IO::DatumWriter.
  2. Wrap union datums with Avromatic::IO::UnionDatum that includes the member index in Avromatic::Model::Types::UnionType#serialize to avoid recomputing the member index in Avromatic::IO::DatumWriter.

Deserialization

Previously the Avromatic::IO::DatumReader included a union member's index when deserializing records in a union so Avromatic could optimize the creation of the corresponding Avromatic model instance. With this change the Avromatic::IO::DatumReader now wraps the union value in a Avromatic::IO::UnionDatum that can be leveraged by Avromatic::Model::Types::UnionType#serialize. This also makes it easier for Avromatic::IO::DatumReader#read_data to delegate to the super class for everything except the union case we need to handle specially.

@will89 - you're prime /cc @tjwp @askreet