microsoft / yardl

Tooling for streaming instrument data
https://microsoft.github.io/yardl/
MIT License
30 stars 5 forks source link

Adding NDJSON serialization support #56

Closed johnstairs closed 1 year ago

johnstairs commented 1 year ago

Adding the ability to serialize to an NDJSON format in addition to the existing HDF5 and compact binary formats.

This format is being added make debugging easier and is not meant for scenarios where performance is important.

For a model like the following:

MyRecord: !record
  fields:
    x: int
    y: int

HelloNDJson: !protocol
  sequence:
    anIntStream: !stream
      items: int
    aBoolean: bool
    aString: string
    aComplex: complexdouble
    aDate: date
    aTime: time
    aDateTime: datetime

    aRecord: MyRecord

    anOptionalIntThatIsNotSet: int?
    anOptionalIntThatIsSet: int?

    aVector: int*
    aDynamicArray: int[]
    aFixedArray: int[2,3]

    aMapWithAStringKey: string->int
    aMapWithAnIntKey: int->int

    aUnionWithSimpleRepresentation: [int, bool]
    aUnionRequiringTag: [int, double]

The NDJSON would look like the following:

{"yardl":{"version":1,"schema":{"protocol":{"name":"HelloNDJson","sequence":[{"name":"anIntStream","type":{"stream":{"items":"int32"}}},{"name":"aBoolean","type":"bool"},{"name":"aString","type":"string"},{"name":"aComplex","type":"complexfloat64"},{"name":"aDate","type":"date"},{"name":"aTime","type":"time"},{"name":"aDateTime","type":"datetime"},{"name":"aRecord","type":"Sandbox.MyRecord"},{"name":"anOptionalIntThatIsNotSet","type":[null,"int32"]},{"name":"anOptionalIntThatIsSet","type":[null,"int32"]},{"name":"aVector","type":{"vector":{"items":"int32"}}},{"name":"aDynamicArray","type":{"array":{"items":"int32"}}},{"name":"aFixedArray","type":{"array":{"items":"int32","dimensions":[{"length":2},{"length":3}]}}},{"name":"aMapWithAStringKey","type":{"map":{"keys":"string","values":"int32"}}},{"name":"aMapWithAnIntKey","type":{"map":{"keys":"int32","values":"int32"}}},{"name":"aUnionWithSimpleRepresentation","type":[{"label":"int32","type":"int32"},{"label":"bool","type":"bool"}]},{"name":"aUnionRequiringTag","type":[{"label":"int32","type":"int32"},{"label":"float64","type":"float64"}]}]},"types":[{"name":"MyRecord","fields":[{"name":"x","type":"int32"},{"name":"y","type":"int32"}]}]}}}
{"anIntStream":1}
{"anIntStream":2}
{"anIntStream":3}
{"aBoolean":true}
{"aString":"hello"}
{"aComplex":[1.0,2.0]}
{"aDate":"2023-05-23"}
{"aTime":"10:50:25.777888999"}
{"aDateTime":"2023-05-23T13:22:51.179178520Z"}
{"aRecord":{"x":3,"y":4}}
{"anOptionalIntThatIsNotSet":null}
{"anOptionalIntThatIsSet":1}
{"aVector":[1,2,3]}
{"aDynamicArray":{"shape":[2,3],"data":[1,2,3,4,5,6]}}
{"aFixedArray":[1,2,3,4,5,6]}
{"aMapWithAStringKey":{"one":1}}
{"aMapWithAnIntKey":[[1,1]]}
{"aUnionWithSimpleRepresentation":false}
{"aUnionRequiringTag":{"float64":44.4}}

There are a few other changes in this PR:

We could consider:

I have not yet updated the documentation. Looking for feedback first.

johnstairs commented 1 year ago

Excluding null option and union values from the serialization of records and including NDJSON in the benchmarks. Also updating the benchmark output:

image