Using lzbench to be able to get a summary compression number for an entire dataset.
Use case: Have a directory with hundreds/thousands/millions of files (a dataset) and would like to see which compression alg would work the best on that dataset. I don't care about the individual file compressibility. Just care about the entire dataset compressibility at that point.
Current issue: lzbench runs through every single file in a dataset and gives compressibility information along with compression/decompession throughput. At some point I may care about throughput.. but for now, I only care about the overall summary compressibility of an entire dataset.
This is also per-algorithm.
Speed is also a factor at that point too as the tool runs through every file individually. I'm willing to wait a while for results, but would need some progress indicator.
Example of potential output:
current dir consists of 1000 files, a few directories and files under those directories.
Using lzbench to be able to get a summary compression number for an entire dataset.
Use case: Have a directory with hundreds/thousands/millions of files (a dataset) and would like to see which compression alg would work the best on that dataset. I don't care about the individual file compressibility. Just care about the entire dataset compressibility at that point.
Current issue: lzbench runs through every single file in a dataset and gives compressibility information along with compression/decompession throughput. At some point I may care about throughput.. but for now, I only care about the overall summary compressibility of an entire dataset.
This is also per-algorithm.
Speed is also a factor at that point too as the tool runs through every file individually. I'm willing to wait a while for results, but would need some progress indicator.
Example of potential output: current dir consists of 1000 files, a few directories and files under those directories.
lzbench -ezstd -r . Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 1348 MB/s 2687 MB/s 1698448384 100.00 /dir/data/set/is/in/ zstd 1.5.0 -1 177 MB/s 1000 MB/s 1094580176 64.45 /dir/data/set/is/in/ zstd 1.5.0 -2 61 MB/s 658 MB/s 1065403069 62.73 /dir/data/set/is/in/ zstd 1.5.0 -3 175 MB/s 1063 MB/s 1085968586 63.94 /dir/data/set/is/in/ zstd 1.5.0 -4 58 MB/s 656 MB/s 1057966516 62.29 /dir/data/set/is/in/ zstd 1.5.0 -5 208 MB/s 1208 MB/s 1085740326 63.93 /dir/data/set/is/in/ zstd 1.5.0 -6 210 MB/s 1199 MB/s 1083948608 63.82 /dir/data/set/is/in/ zstd 1.5.0 -7 197 MB/s 661 MB/s 1082068109 63.71 /dir/data/set/is/in/ zstd 1.5.0 -8 151 MB/s 1063 MB/s 1078084969 63.47 /dir/data/set/is/in/