Open jszwedko opened 3 years ago
Could expire_after_max_bytes
be implemented by a VRL function for size?
i.e. a more efficient version of ends_when = "length(encode_json(.)) > 1000000"
Just an observer of this ticket.
Would the VRL code be run inside the reduce function? If so that would probably work.
Yeah, that's what I was thinking
In the absence of this feature, any suggestions on what we can do to implement it? ends_when: len(encode_json(.)) > threshold
only checks the size of the most recent event, not the batch, and the expire_after
and flush_period
checks both seem to reset when they receive a new event.
Is there even a way to see how many events have been aggregated so far in an ends_when check?
A workaround that I used for a while was to add a remap
transform on the output of any reduce
transform, which split up any batches that were over a maximum size. That pipeline was feeding in to a kafka sink, which rejects messages that are too large, so the absence of this feature led to some data loss.
Depending on how your reduce is merging fields, I think a splitting remap transform may or may not be practical. Ours was appending event data into a single array field, so splitting it was straightforward. Still, the unbounded time/expiration based grouping is less than ideal for this use case. We ended up running a custom build of vector with this patch applied: https://github.com/vectordotdev/vector/pull/14817
A user in Slack noted that they would like to limit the
reduce
transform further by providing absolute limits to cap a set of messages that do not end up matching the multiline conditions. In their case, they saw Vector try to flush a 2 GB event during shutdown.Ideas:
expire_after_max_bytes
option to limit the overall size of an eventexpire_after_max_ms
that functions similarly toexpire_after_ms
but acts from the first event seen, rather than the last event, to automatically flush any events that have been aggregating for the specified number of secondsEither, or both, of these would help limit the
reduce
transform.Another idea is to have these events go to an "errors" output stream rather than flow through the standard output stream for the transform following #3939