microsoft / yardl

Tooling for streaming instrument data
https://microsoft.github.io/yardl/
MIT License
29 stars 5 forks source link

Missing stream item in Python NDJsonProtocolReader when previous stream is empty #143

Closed naegelejd closed 3 months ago

naegelejd commented 3 months ago

Using the Streams protocol: https://github.com/microsoft/yardl/blob/7a0ab26b0a1050a2e9b2394486ea971d9e8a11d3/models/test/unittests.yml#L174-L183

When serialized, if one of these streams is empty and followed by a non-empty stream, the first element of the non-empty stream is accidentally discarded.

Example:

import test_model as tm

json = "test.json"
with tm.NDJsonStreamsWriter(json) as w:
    w.write_int_data(range(0))
    w.write_optional_int_data([1, 2, None, 4, 5, None, 7, 8, 9, 10])
    w.write_record_with_optional_vector_data([])
    w.write_fixed_vector(([1, 2, 3] for _ in range(4)))

with tm.NDJsonStreamsReader(json) as r:
    assert len(list(r.read_int_data())) == 0
    assert len(list(r.read_optional_int_data())) == 10
    assert len(list(r.read_record_with_optional_vector_data())) == 0
    assert (huh := len(list(r.read_fixed_vector()))) == 4, huh

Fails on the last line because it only reads 3 fixed_vectors, instead of the expected 4.

This was exposed by the same test described in #142 (mixing empty and non-empty streams in roundtrip test).

naegelejd commented 3 months ago

Resolved in #144