Very bad benchmark accuracy on small files or few iterations on my hardware.

xcrh commented 8 years ago

lzbench: current version, at commit 59379235c23236c8c42786a46d55071e362771f8 OS: Xubuntu 64-bit, 15.10. Compler: gcc version 5.2.1 20151010 (Ubuntu 5.2.1-22ubuntu2) Flags: defauts from makefile, only BUILD_SYSTEM=linux uncommented.

To reproduce: Try to run benchmark on small file. Like, say, 100Kbytes long.

Result: Memcpy speed is laughable, and not anyhow close to what hardware does, half of benchmarks show identical result. It is pretty clear benchmark got limited by time measurement accuracy, rather than something else.

Extra info: Using higher iterations....

Somehow seems to have no major effect on weird memcpy speed reported, it only depends on file size. Strange.
Could be PITA, because codecs and even some codec modes can have drastically different speed. So, doing lz4fast 20 times is okay, but doing high level compression on brotli for 20 times can be a great test for patience.
Still results in some strange and skewed results with half of codecs exposing same speeds, likely hitting time measurements accuracy issues rather than something else.

Idea: "automatic number of passes" (and, ideally, enable it by default). Should algo run take a time below of say, 1 second, it have to be considered inaccurate and re-tested in several runs, until total run time of about 1 second or more has been reached. It makes little sense to make many iterations on slow algos since they already got fairly accurate result, even on single run, because if it takes 10 seconds, jitter is negligible and jet another waste of 10 seconds would not improve anything. On other hand, if some algo is much faster, it can expose odd results being probably some time measurement/rounding error than something else. Because it's not like if half of codecs can expose the very same "20Mb/s" performance, they are supposed to have different performance. And they do, if I take somewhat larger file.

On side note, large files tend to crash on some strong algos like LZMA on low-RAM systems, probably hitting out of memory (but I do not see OOM killer). So I may want to limit to files like 2Mb or below on such systems. Yet, I can't achieve reasonable accuracy due to timing errors or so.

inikep commented 8 years ago

bugfix in dev: changed timer resolution from milisec to nanosec

xcrh commented 8 years ago

From what I remember, actual timer resolution could wary wildly across systems. In case of Linux, kernel would try to use best available clocksource, but not each and every hardware comes with high-resolution timer. Old PCs lack hi-res timers, and on newer it could be disabled in BIOS. Other devices could have they own idea on what the timer is and which resolution it haves. Sure, kernel would select as clocksource, but it may or may not have good resolution.

And since benchmark is rather interesting thing to run on virtually all kinds of devices, I'm not sure how it would perform. I.e. I expect resolution to be better than 1ms and it possbly could be enough. But I would not take it as granted and have to actually check it. Furthermore, I expect it to be subject to jitter. E.g. because task switching can eventually "steal" some CPU time in favor of other task.

inikep / lzbench

Very bad benchmark accuracy on small files or few iterations on my hardware. #3