Open lspgn opened 6 months ago
Thanks for opening this @lspgn ! I was unaware of the use of varint framing with protobuf messages (I'd only seen length delimited). I think we'd be happy to see a PR introducing this. I'd suggest modeling it as the framing
option since I think that would be the most consistent with the existing codec model.
Thank you @jszwedko, I'm not familiar at all with Rust but I assume this would be in https://github.com/vectordotdev/vector/blob/master/lib/codecs/src/decoding/framing/mod.rs
Yeah, that's right. It could be modeled after the length delimited framer: https://github.com/vectordotdev/vector/blob/master/lib/codecs/src/decoding/framing/length_delimited.rs
A note for the community
Use Cases
Thank you for this software!
When source or sinks make use of protobuf encoding/decoding, the ability to decode
protowire
is missing. When serializing protobuf, the go official library is suggesting to prefix them with avarint
, treating the message like another nested message (without tag though).Some tools like ClickHouse are making use of length prefixed messages (eg: when consuming from Kafka):
I would like to suggest adding such framing option.
Attempted Solutions
Currently, Vector offers two ways of decoding protobuf with framing:
byte
orlength_delimited
.In certain cases, the source making use of a
byte
framing (eg: the buffer in a socket, file sources), there are risks a protobuf message may be "cut" or skipped (2 batched messages, only first one is decoded, rest is discarded). Furthermore, a default/zero-length protobuf would be missed.The
length_delimited
setting is not necessarily standard for protobuf and is not retro-compatible withvarint
.Unfortunately, it's not possible to create a "wrapper" protobuf message since the
tag
(1 in the example below) must be encoded as well as varint:Proposal
My suggestion would be the following for sources and sinks.
Either having the protobuf decoder assume it will read a
varint
and consider it a length. This said, not sure if this could be one-to-many way of decoding messages (+ waiting for the rest of the bytes).or having a proper
varint
in framing:Thank you!
References
No response
Version
vector 0.36.1 (2857180 2024-03-11 14:32:52.417737479)