opensearch-project / custom-codecs

OpenSearch custom lucene codecs for providing different on-disk index encoding (e.g., compression).
Apache License 2.0
6 stars 13 forks source link

[FEATURE] Support for more compression levels for Zstandard codecs #115

Open sarthakaggarwal97 opened 4 months ago

sarthakaggarwal97 commented 4 months ago

Is your feature request related to a problem?

Currently, custom-codecs supports 6 compression levels with Zstandard compression codecs. ZSTD library supports compression levels from 1 to 22.

What solution would you like?

We should look into increasing the spectrum of compression levels we support currently.

mgodwan commented 4 months ago

ZSTD library supports compression levels from 1 to 22.

ZSTD algorithm should support from -7 to 22. Does the library not expose negative levels?

sarthakaggarwal97 commented 4 months ago

@mgodwan we should be able to set negative levels as well. In my previous experiments, the compression ratio of negative levels is almost similar to that of lz4, and so was the performance. Also, in the Zstd, the compression parameters of negative levels is exactly same, so it would be worth noting the exact difference.

We can run fresh experiments and expand the levels as much as possible.

mgodwan commented 4 months ago

One thing to note is that higher levels can achieve better compression, but at the cost of significant throughput speed and memory overhead. e.g. zstd regards level over 19 as ultra due to the significant memory overhead. While providing knobs is generally a good idea, I think we need to be careful in how much we expose for the kind of system. e.g. increasing the level to 22 will yield better ratio but speed will reduce a lot which may not be suitable for use case like opensearch given that cost of compute is significantly higher than the storage cost in recent times.

I'd say that it may be good to hear some feedback before increasing the support for more levels so that operators don't fall into traps of attempting to over optimize on this setting (1 to 6 seems to be a good exposed range imho)

$ zstd -e22 -b1 -S test_zstd.json
 1#test_zstd.json    :  72953608 ->   2275485 (x32.06), 1427.0 MB/s  3129.1 MB/s
 2#test_zstd.json    :  72953608 ->   2297113 (x31.76), 1467.0 MB/s, 3026.4 MB/s
 3#test_zstd.json    :  72953608 ->   2324678 (x31.38), 1246.2 MB/s, 3067.8 MB/s
 4#test_zstd.json    :  72953608 ->   2322891 (x31.41), 1233.0 MB/s, 3084.5 MB/s
 5#test_zstd.json    :  72953608 ->   1881209 (x38.78),  319.3 MB/s, 3435.6 MB/s
 6#test_zstd.json    :  72953608 ->   1705027 (x42.79),  218.5 MB/s, 4170.6 MB/s
 7#test_zstd.json    :  72953608 ->   1620936 (x45.01),  198.6 MB/s, 4391.5 MB/s
 8#test_zstd.json    :  72953608 ->   1549335 (x47.09),  166.4 MB/s, 4451.8 MB/s
 9#test_zstd.json    :  72953608 ->   1542736 (x47.29),  154.0 MB/s, 4368.0 MB/s
10#test_zstd.json    :  72953608 ->   1475187 (x49.45),  114.2 MB/s, 4524.6 MB/s
11#test_zstd.json    :  72953608 ->   1426412 (x51.14),   76.3 MB/s, 4654.2 MB/s
12#test_zstd.json    :  72953608 ->   1426178 (x51.15),   71.7 MB/s, 4666.8 MB/s
13#test_zstd.json    :  72953608 ->   1368801 (x53.30),   57.9 MB/s, 4812.5 MB/s
14#test_zstd.json    :  72953608 ->   1271762 (x57.36),   40.2 MB/s, 4997.8 MB/s
15#test_zstd.json    :  72953608 ->   1222523 (x59.67),   32.2 MB/s, 4993.7 MB/s
16#test_zstd.json    :  72953608 ->   1410705 (x51.71),   5.61 MB/s, 4532.0 MB/s
17#test_zstd.json    :  72953608 ->   1370144 (x53.25),   5.29 MB/s, 4379.3 MB/s
18#test_zstd.json    :  72953608 ->   1432867 (x50.91),   5.14 MB/s, 4350.4 MB/s
19#test_zstd.json    :  72953608 ->   1242815 (x58.70),   2.29 MB/s, 4416.5 MB/s
20#test_zstd.json    :  72953608 ->   1226806 (x59.47),   2.24 MB/s, 3776.9 MB/s
21#test_zstd.json    :  72953608 ->   1131111 (x64.50),   0.98 MB/s, 4314.5 MB/s
22#test_zstd.json    :  72953608 ->   1121897 (x65.03),   0.71 MB/s, 4252.4 MB/s
dblock commented 2 weeks ago

Catch All Triage - 1 2 3 4 5