Open ElementTech opened 2 weeks ago
Hi @ElementTech ! I think this is the case, but do I take it to mean that you tried increasing batch.max_bytes
and batch.max_events
but saw no difference?
Note that batch.max_bytes
may not always map to object size because the way that event sizes are calculated can differ from batch serialized size (see https://github.com/vectordotdev/vector/issues/10020).
Hey @jszwedko, yes, I've played around with those values both up down. Just for sake of testing as you can see I've put all numbers extremely high but still no difference in behavior.
I should also note that each of those .json
files has around ~3500 events (single level dictionaries), but not exactly 3500, it can deviate in a few hundreds. I can assume that whatever it is that decides to save it stops at a certain size as opposed to a certain event count. Also of course not all events are evenly sized.
I might be wrong but I'm also using disk
instead of memory
when collecting the events, and even if it was memory, should this resulting batch size still be this comparatively small?
Thanks!
Apologies for the delayed response!
Hey @jszwedko, yes, I've played around with those values both up down. Just for sake of testing as you can see I've put all numbers extremely high but still no difference in behavior.
Gotcha, that is interesting.
I should also note that each of those
.json
files has around ~3500 events (single level dictionaries), but not exactly 3500, it can deviate in a few hundreds. I can assume that whatever it is that decides to save it stops at a certain size as opposed to a certain event count. Also of course not all events are evenly sized.
One shot in the dark, can you try setting filename_time_format
to ""
? I believe that suffix is only added when writing the batch, but I could be wrong and it is actually involved in the partitioning of events such that each object may represent roughly one-second worth of events.
I might be wrong but I'm also using
disk
instead ofmemory
when collecting the events, and even if it was memory, should this resulting batch size still be this comparatively small?
In Vector's architecture, buffers appear in front of sinks and so, from the sink perspective, it makes no difference if the fronting buffer is memory
or disk
. It is transparent to the sink.
Problem
I have Vector installed in Kubernetes in AWS. I am using SQS as a source and S3 as a sink. No matter how high I set batching and buffer parameters, at max-load of event ingestion, my s3 bucket receives them at exactly 2.4 MB batches. When an event spike ends, it releases the rest of the events in smaller files until finished.
Configuration
Version
0.42.0-distroless-libc
Debug Output
No response
Example Data
No response
Additional Context
I have two environments. The only difference between them is the _batch.timeoutsecs parameter. In my dev environment, it is set to 60, and in my production to 1800. The exact same issue (2.4 MB sized files) happens in both.
References
No response