manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9k stars 503 forks source link

FSST compression method #1632

Open AbstractiveNord opened 11 months ago

AbstractiveNord commented 11 months ago

Is your feature request related to a problem? Please describe. Main data type in ManticoreSearch is String. Compression techniques allows to increase hardware utilization, efficient use of memory and disk resources. It's good idea to check, can be FSST technique used for improving ManticoreSearch.

Describe the solution you'd like Test, measure and implement FSST string compression method.

Additional context Less data to read may provide better perfomance, so it's good to test FSST both to row-wise storage and columnar storage.

FSST Repository.

tomatolog commented 11 months ago

stored strings use docstore that already uses lz4 compression library and you already could use high level of the ocmpression for it as described at manual docstore_compression

AbstractiveNord commented 11 months ago

stored strings use docstore that already uses lz4 compression library and you already could use high level of the ocmpression for it as described at manual docstore_compression

Authors of FSST says that their method provides better compression ratio and better compression speed.

sanikolaev commented 11 months ago

Some notes after today's dev call:

tomatolog commented 11 months ago

Authors of FSST says that their method provides better compression ratio and better compression speed.

From their README.md I see the speed is the same

When compared to e.g. LZ4 (which is block-based), FSST further achieves similar decompression speed and compression speed, and better compression ratio.

The advantages that it could decompress the only strings without touching the whole block. And equality comparisons can be performed without decompressing.