inikep / lzbench

lzbench is an in-memory benchmark of open-source LZ77/LZSS/LZMA compressors
885 stars 179 forks source link

Confusing "out of memory" behavior when specifying directory input #27

Open travisdowns opened 7 years ago

travisdowns commented 7 years ago

I specified a directory as input like:

lzbench corpus/silesia/

where /silesia is a directory. I got following error:

lzbench 1.5 (64-bit Linux)   Assembled by P.Skibinski
Not enough memory, please use -m option!done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)

This error didn't make much sense since the default for -m is apparently "unlimited", so it can't exactly be increased. I did try a few options like -m1000, -m1, -m6000 - and these all resulted in the (different) error output:

lzbench 1.5 (64-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress.  Orig. size  Compr. size  Ratio Filename
memcpy                   0.00 MB/s      ERROR           0            0   -nan     corpus/silesia

Note that the silesia directory only contains ~203 MB of files in total and that I have ~12GB of available physical memory on my box.

I note that using the -r option does do what is expected, but it's not clear to me that it is needed for directories (certainly, the documentation doesn't make it clear). If -r must be specified for directories, why not just remove the option and make that the behavior if a directory is given?

inikep commented 7 years ago

It should be fixed with this commit: https://github.com/inikep/lzbench/commit/735da23eec93040a5bd3c60117756acc54580eef

The -r must be specified because currently there is no way to process files in a directory without processing subdirectories.

travisdowns commented 7 years ago

Awesome! I guess it does raise the question of why you need the -r at all: if the user provides a directory, isn't it implied that they'd like to process it? In any case, any fix is a good fix here I think :)