Feature request: benchmark modes

ghost commented 4 years ago

These days we have several compression and checksum alternatives in Btrfs. I think it would be very useful with a built-in benchmark feature so that it is easier to make an informed decision on a specific system. For example ARM, AMD and Intel x86 all have different strengths and use-cases, making it difficult to generically say what combination of options are the best fit.

Examples:

btrfs benchmark --hash all
btrfs benchmark --hash crc32,xxhash
btrfs benchmark --compression all
btrfs benchmark --compression zstd:10,zstd:15,lzo

Output should probably be in both MiB/s and hashes/s.

It is likely that we will see more hashes, compression methods or encryption options added to Btrfs in the future, so having a benchmark will be even more valuable.

kdave commented 4 years ago

There's a benchmark program I used to evaluate the hashes (in git crypto/hash-speedtest.c) but what you suggest makes sense to me to allow evaluation on the end user systems.

It is likely that we will see more hashes, compression methods or encryption options added to Btrfs in the future, so having a benchmark will be even more valuable.

Well, we might have more, eventually, but I hope that currently available hashes and compression algos are sufficient. If we add anything new, it should address either a new usecase or performance must be significantly better than we have now. For that reason LZ4 is not among compression algos because it's only better in microbenchmarks (compared to LZO). The acutal implementation in btrfs adds overhead, plus adding it to all related tools like bootloaders. But I digress, that's just a side note.

A 1st level command benchmark is probably justified, as we might want to have hash and compression as 2nd level commands and then add more options (like block size, level, iterations etc).

kdave commented 4 years ago

The btrfs-progs package will provide only the fallback reference implementations, so it's portable and without any external dependencies. It's planned to add optional support for eg gcrypt or libsodium that contain implementations that are reviewed and certified. And also provide the accelerated versions, that's what I don't want to pull to btrfs-progs.

So the benchmark could tell you the speed on btrfs-related blocksize and with a given implementation, however this is still only for userspace. The kernel has yet another implemetation. When pulling the external crypto libraries, they become the runtime dependency and must be also on initrd, so there are space constraints to consider.

The benefit of the built-in benchmark is that it can be evaluated directly on the system where it's going to be used, but in order to be accurate it needs to use the implementations that are going to be actually used. Eg. unaccellerated SHA256 is worse than BLAKE2 but with CPU instructions it's on par or better.

The drawback is that the benchmark could produce misleading results, using unoptimized implementations.

adam900710 commented 4 years ago

I'm not a fan of the benchmark mode idea.

Checksum is really just part of the overhead. It doesn't really show how it slows down or speeds up the whole fs.

One of the most common misunderstand is, LUKS performance. If your SSD can do IO at, say 1024MiB/s, and the encryption algo you choose can also do 1024MiB/s. User may think the LUKS setup can also do 1024MiB/s.

But that's not the case, encryption and IO happens in serial, thus it can only go 512MiB/s at most. Not to mention small IOs will be more limited by IOPS bottleneck and other things.

The same applies to btrfs. it depends on how fast the tree/data operation is, then the checksum overhead is just part of the whole overhead. So I really don't tend to provide built-in benchmark unless users really know what they are doing.

And even if we really want to provide such benchmark mode, at least provide it as an alias to openssl or similar tools. The extra hassle really doesn't worth.

Thanks.

kdave commented 3 years ago

I'm reconsidering this. People have asked for performance evaluation of the hashes and that the tool is just in git makes this less friendly. The reason why somebody wants to measure actual performance is because some hashes perform well in x86 but not on ARM. As an example is xxhash, that's fine on intel and on par with crc32c, but on ARM it's slower. The accelerated versions of sha256 or blake2 could be quite different on intel and ARM.

What Qu objects is evaluation of hashing without relevance to actual IO performance. I don't think this is the point of the benchmark. It's just for the raw speed depending on the cpu capabilities.

kdave / btrfs-progs

Feature request: benchmark modes #236