Tuning of chunkSize for LZ4 Compression Algorithm to gain performance in Indexing

opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.

https://opensearch.org/docs/latest/opensearch/index/

Apache License 2.0

9.32k stars 1.72k forks source link

Tuning of chunkSize for LZ4 Compression Algorithm to gain performance in Indexing #7475

Open sarthakaggarwal97 opened 1 year ago

sarthakaggarwal97 commented 1 year ago

Is your feature request related to a problem? Please describe. Currently, there are two compression algorithms in place for OpenSearch, i.e LZ4 and zlib. In this, LZ4 is the default compression algorithm. Within this implementation, the default chunkSize used for LZ4 is 8kB. The idea is to set this value of chunkSize to 16kB to improve indexing throughput.

Initial benchmarking of the 8kB vs 16kB chunkSize is resulting in 2.5-3% gains in P50 throughput with the http_logs dataset.

Upon increasing the value of chunkSize to 16kb, the gains that we are observing are primarily due to lesser flushes being triggered upon indexing.

Describe the solution you'd like A filter compressor can be introduced on top of the current lz4 implementation. This filter compressor will essentially use the underlying LZ4 algorithm but with a configurable chunkSize. Additionally, this chunkSize can also be introduced an index setting.

cc: @mgodwan @backslasht @shwetathareja

dblock commented 1 year ago

Good stuff! Any downsides? I am assuming varying chunk size produce the same output, so this can potentially even be dynamic? Are there other options in the compression codecs that we should expose at the same time?

peternied commented 1 year ago

As a well written issue with a promising improvement to OpenSearch I'm marking this as triaged. It would be great to see a pull request and more details

sarthakaggarwal97 commented 1 year ago

@dblock There could be one possible downside that we will be loading a larger chunk in memory at a time. Apart from that, the search latencies and merge performances are comparable for both chunk sizes in the initial runs. We are working on experiments and monitoring to quantify any impact.

Yes, the chunk size can be dynamic.

Other than chunk size, we have opportunity to expose blockShift (the log in base 2 of number of chunks to store in an index block) and maxDocsPerChunk (the maximum number of documents in a single chunk). We can try out the behavior by tweaking these parameters as well.

dblock commented 1 year ago

I think we should expose both.

When I say "downside" I didn't mean "tradeoff". I think it's absolutely great that we offer users a way to trade off speed (latency) for space (memory usage) - there are no downsides to that!

dblock commented 1 year ago

This is related to #3354.

dblock commented 1 year ago

I propose we add a static index setting override for chunk size.

sarthakaggarwal97 commented 1 year ago

I've performed some benchmarks with 8K and 16 Block sizes with LZ4 (default) compression codec with HTTP Logs Dataset.

I've observing around 3% improvement with HTTP Logs Dataset in the average throughput, around 4% reduction in storage size.

During benchmarks, I enabled some instrumentation specifically around compression

Cumulative latency for the compression improved with 16K Block size. Sharing the POC implementation for enabling block sizes and instrumentation around compression over here

I also observed good improvements in the total merge sizes and the merge latencies as well for the 16 Block size with instrumentation in my local setup.