tenzir / public-roadmap

The public roadmap of Tenzir
https://docs.tenzir.com/roadmap
4 stars 0 forks source link

Wire Format ("Bitz") #128

Closed dominiklohmann closed 3 months ago

dominiklohmann commented 7 months ago

All of our binary formats either lose typing information or do not support a heterogeneous output. To enable better communication over our connectors between Tenzir nodes, we want to expose our internal wire format.

### Definition of Done
- [ ] https://github.com/tenzir/issues/issues/1085
mavam commented 7 months ago

We've been pondering about the name of the wire format and ended up with bitz for "binary tenzir".

tobim commented 7 months ago

Basic Idea for the format itself:

We re-use the RecordBatch and RecordBatchStream as provided by Arrow, but prefix every batch with a header containing the following bytes:

IIIIVHHHHHHHHSSSSSSSS

Where

IIII = A fixed identifier ("TNZR")
V = A 1 Byte header version number, starting at 0
HHHHHHHH = An 8 Byte hash of the schema
SSSSSSSS = An 8 Byte length of the payload RecordBatch