powturbo / TurboBench

Compression Benchmark
326 stars 34 forks source link

Support ZSTD's dictionary compression #42

Closed psyduffy closed 1 year ago

psyduffy commented 1 year ago

Does TurboBench support the ZSTD dictionary compression performance test? I want to try to compare the performance of ZSTD compression use the dictionary with other algorithms in small data mode.

powturbo commented 1 year ago

actually the external dictionary "mydic" must be in the current directoy

You can also benchmark multiple small files using multiblock mode in turbobench: 1 - store your small files into a multiblock file using option "M" ./turbobench -Mmymultiblock files (mymultiblock output format: length1,file1,length2,file2,...lengthN,fileN, length=4 bytes file/block length) 2 - Benchmark using option "-m" : ./turbobench -ezstd,22Dmydic mymultiblock -m

psyduffy commented 1 year ago

Thanks for your answer. The test can be done using multiblock, but my test sample data has a lot of entries, more than 50 thousand. When I try to test the entire folder,

turbobench -Mmymultiblock /home/testdir/*

it reports an error:

zsh: argument list too long: /home/TurboBench/turbobench

The Benchmark provided by zstd-cli is very convenient. It can perform Benchmark on an entire folder through a simple command like:

zstd -b3 -D dictfile -r sampledir

unfortunately it does not support the comparison of multiple algorithms.

powturbo commented 1 year ago

The format of a block is :

You can create small multiblock files (for ex. with 10.000 files each) and (binary) concatenate these files into one single big file.