Closed dominiklohmann closed 3 months ago
We've been pondering about the name of the wire format and ended up with bitz for "binary tenzir".
Basic Idea for the format itself:
We re-use the RecordBatch and RecordBatchStream as provided by Arrow, but prefix every batch with a header containing the following bytes:
IIIIVHHHHHHHHSSSSSSSS
Where
IIII = A fixed identifier ("TNZR")
V = A 1 Byte header version number, starting at 0
HHHHHHHH = An 8 Byte hash of the schema
SSSSSSSS = An 8 Byte length of the payload RecordBatch
All of our binary formats either lose typing information or do not support a heterogeneous output. To enable better communication over our connectors between Tenzir nodes, we want to expose our internal wire format.