microsoft / yardl

Tooling for streaming instrument data
https://microsoft.github.io/yardl/
MIT License
29 stars 5 forks source link

Support schema evolution across versions #130

Closed naegelejd closed 4 months ago

naegelejd commented 4 months ago

Introduces support for schema evolution across versions, including automated schema change detection and backward-compatible serialization (currently only C++ binary).

johnstairs commented 4 months ago

If you run yardl generate --watch and change the current model file, any warnings produced briefly appear and then are cleared.

johnstairs commented 4 months ago

If I run yardl generate --watch and change a previous model file, the change is ignored.

johnstairs commented 4 months ago

I think we should warn if a protocol is renamed/removed

johnstairs commented 4 months ago

If I remove a field from a record, I get a compilation error saying that the variable that we create for writing the value is uninitialized. Also, the message "Removing non-Optional field 'xyz' may result in undefined behavior with previous versions" is not so great. I think we should have defined behavior in all cases.

johnstairs commented 4 months ago

Changing a type from

[int, float]

to

[int, float]

Seems to be allowed but the serialization to the previous version is broken.

Changing from int to int*, however, is validation error.

Changing from

[int, bool]

to

[int, float]

Results in C++ compilation errors.

johnstairs commented 4 months ago

When changing from int32 to int64 and encountering an overflow, I think we should raise and error.

johnstairs commented 4 months ago

Seems like this could be considered compatible:

image
johnstairs commented 4 months ago

Also this could be accepted (with warning of possible overflow):

image
johnstairs commented 4 months ago

Same here:

image
johnstairs commented 4 months ago

If you change the numerical values of an enum, we don't seem to do anything about it in the serializers.

johnstairs commented 4 months ago
image

Results in compilation error:

protocols.cc:70:11: error: 'a' may be used uninitialized [-Werror=maybe-uninitialized]
   70 |     value = std::move(a);
      |     ~~~~~~^~~~~~~~~~~~~~
johnstairs commented 4 months ago

I think this should be accepted:

image
johnstairs commented 4 months ago

Swapping the union cases and writing to the previous format seems to write the same binary data:

image
johnstairs commented 4 months ago

I'm wondering why it's ok to add a stream to a protocol but not to remove one.

johnstairs commented 4 months ago

I think it's getting confused with this:

image

When writing in the old format, it writes out the same as in the new format, effectively swapping A and B types

naegelejd commented 4 months ago

I have addressed all of the feedback above except for the following items, which we plan to support in the future:

Missing support for: Effort Explanation
Scalar <-> Vector/Array conversions Low This is a reasonable schema change yardl should support
Removing Protocol steps Medium This is a reasonable schema change yardl should support
Changing generic Record type arguments High Requires making type conversion dynamic (e.g. template functions) so that a generic Record serializer can convert fields based on type arguments at runtime
Testing runtime conversion errors Medium Requires 2+ schema versions, codegen, compilation, and GTest for each scenario under test

I plan to open an issue for each of these and address them after I finish implementing support for Matlab.

johnstairs commented 4 months ago

I think we should allow complexfloat32 <-> complexfloat64

johnstairs commented 4 months ago

If I change from a float32 to an int64, the runtime checks are not what I would expect. When writing to the previous version, we check if the integer value falls within the range of a float32, which it always will. Then when reading the older version, we simply do a cast, when that is what might overflow.

naegelejd commented 4 months ago

If I change from a float32 to an int64, the runtime checks are not what I would expect. When writing to the previous version, we check if the integer value falls within the range of a float32, which it always will. Then when reading the older version, we simply do a cast, when that is what might overflow.

I will get this right. Too bad this isn't in the standard library: https://www.boost.org/doc/libs/1_84_0/libs/numeric/conversion/doc/html/boost_numericconversion/improved_numeric_cast__.html

naegelejd commented 4 months ago

Added support for conversions between complex types, fixed the overflow checks, and updated the docs.