inikep / lzbench

lzbench is an in-memory benchmark of open-source LZ77/LZSS/LZMA compressors
885 stars 179 forks source link

When running against a directory, memcpy only runs once #28

Open travisdowns opened 7 years ago

travisdowns commented 7 years ago

When using

./lzbench -r -elz4 ../corpus/silesia

To run against a directory, memcpy only runs once:

lzbench 1.5 (64-bit Linux)   Assembled by P.Skibinski
Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                  11571 MB/s 11643 MB/s    10192446 100.00 ../corpus/silesia/dickens
lz4 1.7.3                 267 MB/s  1830 MB/s     6428742  63.07 ../corpus/silesia/dickens
lz4 1.7.3                 449 MB/s  2207 MB/s     7716839  35.72 ../corpus/silesia/samba
lz4 1.7.3                 290 MB/s  1660 MB/s    20139988  48.58 ../corpus/silesia/webster
lz4 1.7.3                 437 MB/s  1978 MB/s    26435667  51.61 ../corpus/silesia/mozilla
lz4 1.7.3                1001 MB/s  5196 MB/s     8390195  99.01 ../corpus/silesia/x-ray
lz4 1.7.3                 376 MB/s  1783 MB/s     5256666  52.12 ../corpus/silesia/osdb
lz4 1.7.3                 260 MB/s  1817 MB/s     3181387  48.00 ../corpus/silesia/reymont
lz4 1.7.3                 360 MB/s  1685 MB/s     4338918  70.53 ../corpus/silesia/ooffice
lz4 1.7.3                 375 MB/s  2230 MB/s     6790273  93.63 ../corpus/silesia/sao
lz4 1.7.3                 565 MB/s  2078 MB/s     1227495  22.96 ../corpus/silesia/xml
lz4 1.7.3                 413 MB/s  2197 MB/s     5440937  54.57 ../corpus/silesia/mr
lz4 1.7.3                 695 MB/s  2731 MB/s     5533040  16.49 ../corpus/silesia/nci

Presumably the intent is for memcpy to run against all the files, like the other algos.

inikep commented 7 years ago

My intention was to use memcpy only for the first file to show CPU/mem speed that we are dealing with. On my laptop it goes up to 8719 MB/s with silesia.tar.

travisdowns commented 7 years ago

I see, I guess it makes sense. One caveat is that memcpy speed is very dependent on the size of the underlying buffer - for large buffers (i.e., larger than L3) it will generally converge to something close to the underlying DRAM read + write bandwidth, while for smaller files, it could be an order of magnitude more (e.g., 100 GB/s) if it fits in L1, L2, etc.

So you can get weird results, like if the first file is big, you might get a RAM-bound memcpy figure like 11 GB/s on my box, but then for smaller files super-fast compressors (lets say one that just uses memcpy internally to copy the whole buffer at zero compression) could get a much larger value which doesn't make much sense.

Currently I guess it isn't much of an issue because most compression algos are CPU-bound at a speed lower than memory bandwidth and perform generally the same regardless of whether the working set fits in cache or not, so it's not too visible...

travisdowns commented 7 years ago

I guess I would kind of expect the memcpy "codec" to just act like another codec, and obey the same parameters and behaviors, rather than being special cased like it is today in the code (probably this would also reduce the code complexity).