vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.26k stars 1.6k forks source link

GELF codec needs to support chunking #13292

Open neuronull opened 2 years ago

neuronull commented 2 years ago

I'm noticing that GELF does have a chunking part of its protocol too, for multi-part messages when using UDP:

Prepend the following structure to your GELF message to make it chunked:

Chunked GELF magic bytes - 2 bytes: 0x1e 0x0f
Message ID - 8 bytes: Must be the same for every chunk of this message. Identifying the whole message and is used to reassemble the chunks later. Generate from millisecond timestamp + hostname for example.
Sequence number - 1 byte: The sequence number of this chunk. Starting at 0 and always less than the sequence count.
Sequence count - 1 byte: Total number of chunks this message has.
All chunks MUST arrive within 5 seconds or the server will discard all already arrived and still arriving chunks. A message MUST NOT consist of more than 128 chunks.

I think we can split this off into a follow up issue though and just start with basic decoding support of incoming messages.

Originally posted by @jszwedko in https://github.com/vectordotdev/vector/pull/13288#issuecomment-1163483729

neuronull commented 2 years ago

This should be attempted after #4868 is closed.

paluchnuggets commented 2 years ago

Any update on this matter?

neuronull commented 2 years ago

Any update on this matter?

:wave: I've added it to our backlog. It currently doesn't have a high priority essentially due to not knowing how important it is to the community.

Please vote on this issue by adding a +1 reaction to the original comment on this issue to add your vote to prioritization. Thanks!

arivra commented 1 year ago

@neuronull

I think we can split this off into a follow up issue though and just start with basic decoding support of incoming messages.

Does vector support chunking for output messages now? Or what you mean is start with the reassembling first?

Thank you

neuronull commented 1 year ago

Hi @angelrib , that comment actually came from @jszwedko (https://github.com/vectordotdev/vector/pull/13288#issuecomment-1163483729)

So this (#13292) is the referred to follow-up issue to track chunking support.

I believe what Jesse meant is that we can probably just start with implementing chunking on the decoder side, and add the encoding side in a separate PR.

arivra commented 1 year ago

Reviewing the code it looked like that, I just wanted to confirm it. Thank you very much!

gernoteger commented 1 year ago

Chunking imho is mandatory for UDP in order to be usable, since log messages would be limited to 8192 bytes otherwise, and fail depending on content. Currently GELF is implemented as an encoding for various transports. Chunked GELF UDP has aspects of a transport protocol, like dependencies on packets and a defined timeout for a sequence. Maybe this would need dedicated source and sink components.

jorgehermo9 commented 4 months ago

Opened #20769 for addressing the decoding part. I'm currently working on it and will have it implemented soon. Just some details left

jorgehermo9 commented 3 weeks ago

If anyone is tracking this, support for uncompressed chunked gelf has been merged https://github.com/vectordotdev/vector/pull/20859.

I'll be addressing the decompression part, it can be tracked at #21153