microsoft / yardl

Tooling for streaming instrument data
https://microsoft.github.io/yardl/
MIT License
29 stars 5 forks source link

Model Evolution #121

Closed naegelejd closed 4 months ago

naegelejd commented 6 months ago

The current approach in yardl for supporting for model evolution involves:

  1. Comparing the current model to each previous version
  2. Annotating the current model with each detected change
  3. Processing the annotations to emit user warnings/errors
  4. In codegen, inspecting annotations to
    1. Serialize previous versions of TypeDefinitions and Protocols
    2. Convert between types where necessary

Details on this rough draft PR:

  1. The relevant changes are in:

    1. tooling/pkg/dsl/evolution.go
    2. tooling/internal/cpp/include/detail/binary/header.h
    3. tooling/internal/cpp/include/detail/binary/reader_writer.h
    4. tooling/internal/cpp/protocols/protocols.go
    5. tooling/internal/cpp/binary/binary.go
  2. There is still much to be done in evolution.go but I have a good handle on that. Examples:

    • Consider NOT using Annotations to capture schema changes (it works fine, but it's verbose and error prone)
    • Correct conversions for scalar <-> Union type changes
    • Handle Union <-> Union type changes (e.g. adding/removing a Type)
    • Capturing TypeDefinition changes, e.g. to warn about added/removed non-Optional Record fields
    • Detect changes to TypeArguments
    • Other TODOs in code
  3. The changes to the included C++ binary headers distinguish between schema_ and previous_schemas_ only to avoid breaking the NDJson and HDF5 code. This would be cleaned up and probably use just a single vector of schemas.

  4. Codegen is not using the version label specified in the package file. Once the schema is known by the Protocol Reader/Writer, it just uses the schema index to determine which serializers to call.

  5. Need to determine the best way for a User to instantiate a Protocol Writer for an older version of a Protocol. Currently, the User must have instantiated a Protocol Reader r using an older schema, then say MyProtocolWriter w(stream, r.GetSchema())

    • We could generate unique constructors for each version, thereby utilizing the version label specified in the package file.
  6. Binary codegen needs a bit more cleanup to remove duplicate code for type conversions. Thoughts on the switch(schema_index_) {...} approach?

  7. The example models and C++ code (within evolution/) are just a starting point for integration tests