Closed andijcr closed 7 months ago
LZ4 can allocate a maximum of 4MiB + 4 bytes buffer while decompressing, as mentioned in the ticket details.
This allocation is outside of redpanda control, but registering a custom allocator with LZ4F_CustomMem
can allow us to control how the allocation happens, and allows us to make use of a reusable buffer and avoid this dynamic allocation for compressed records.
Implementation would require a per shard preallocated buffer which is used by the custom allocator.
Does lz4 support fragmented memory at all or streaming operations? If so, we could probably write an adapter of some sort like we've done for zstd streaming. Otherwise, we already have the infrastructure to allocate large contiguous scratch spaces on bootup for this purpose. I think since we've adjusted zstd to do streaming that we don't use it anymore, but it's there and could probalby be used here for lz4.
Does lz4 support fragmented memory at all or streaming operations?
I think probably "no" to the first and "yes" to the second.
Streaming helps avoiding needing to allocate large output buffers, since we can accept the output in chunks (and it helps avoid stalls). However, I think this 4MB is an internal allocation during decompression that it will make regardless of streaming or not, for a block which it needs to treat as one unit. I don't think it can use a fragmented allocation for this block.
So yes, it's the pre-allocate route. Probably we don't need many buffers per shard since it's a CPU bound activity which doesn't seem useful to do in parallel (unless maybe we are doing big decompressions in a streaming way while reading from disk: then it does seem like we'd want to overlap them). @ballard26 mentioned that he had an object-pooling thing he wrote for zstd which might be useful here?
It looks like per call to LZ4F_decompress
there are two potential mallocs which can happen. These are guaranteed to happen together, one for the input scratch space and one for the output scratch space. Both will require roughly similar sizes depending on the input. The maximum sizes for the two are:
The sizes will depend on the input frame header as defined in https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md (Block Maximum Size section).
So before each call to frame_compressor::uncompress
we should have enough space for the above two buffers.
object-pooling
Yes, it looks like it can be used here, the pool can used be for preallocated buffers.
As an optimization to avoid having all decompress operations waiting for a semaphore even when they require small amounts of space, we can use LZ4F_getFrameInfo
, which is already called before starting decompression (when enough data is available).
If it succeeds it gives us information about the frame which contains the block size id. We only need to wait for a large contiguous chunk when the block size id corresponds to 4MiB, in other cases we can fallback to a fast path of normal allocation without waiting for resources.
The use of a custom memory manager requires static linking for lz4, from the relevant header lz4frame.h
/* These declarations are not stable and may change in the future.
* They are therefore only safe to depend on
* when the caller is statically linked against the library.
* To access their declarations, define LZ4F_STATIC_LINKING_ONLY.
*
* By default, these symbols aren't published into shared/dynamic libraries.
* You can override this behavior and force them to be published
* by defining LZ4F_PUBLISH_STATIC_FUNCTIONS.
* Use at your own risk.
*/
LZ4F_decompress internally can allocate a +4MiB buffer and that lead to this crash
this happened in the segment::append path and it's a known problem https://github.com/redpanda-data/redpanda/blob/7d8d220aff01b9a2b1eb6e2c8a6ca039857d179c/src/v/storage/segment.cc#L478-L486
https://github.com/redpanda-data/redpanda/blob/7d8d220aff01b9a2b1eb6e2c8a6ca039857d179c/src/v/compression/internal/lz4_frame_compressor.cc#L212-L218
https://github.com/lz4/lz4/blob/4cf83dd1952898e2a8c5fcd689ce459c53f22ff0/lib/lz4frame.c#L1683 https://github.com/lz4/lz4/blob/4cf83dd1952898e2a8c5fcd689ce459c53f22ff0/lib/lz4frame.c#L1425
JIRA Link: CORE-1722