Closed StephanDollberg closed 1 year ago
Switching to fragmented_vector
or chunked_fifo
not a 100% replacement as we use vector::erase
here which neither currently support.
Maybe just use std::deque
.
One further problem is that each partition replica has a segment_appender so with large partition counts we don't just want to prealloc 8k for all of them.
Hence, std::deque
is out of the window (4k chunks by default in libc++). So will probably need to use fragmented_vector
with an erase workaround.
There is further the question of whether there is actually a different bug that causes there to be 100k+ pending flush_ops.
I looked at the memory sampler dumps from the crash (i.e.: this is a fragmentation OOM).
There is a few interesting allocation sites though which all come from the replicate_batcher to segment_appender path. Most importantly all of them have kafka::group::handle_offset_commit
in there path so this is likely related to an offset commits storm or something?
Version & Environment
Redpanda version: 23.2.3
segment_appender::flush_ops_
is astd::vector
. OOMed under load. Must have been hundredthousands offlush_op
s as that struct is fairly small.