openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.45k stars 1.73k forks source link

New feature to implement SHA3-256 checksum to get support of QAT acceleration #8118

Closed TheTDD closed 4 years ago

TheTDD commented 5 years ago

System information

Type Version/Name
Distribution Name debian
Distribution Version 9.5
Linux Kernel 4.18.6-rt3
Architecture amd64
ZFS Version 0.7.11-1
SPL Version 0.7.11-1

Describe the problem you're observing

In module/zfs/sha256.c is mentioned sha512 as the possible checksum algo. It seems that the implementation is not standard sha512, therefore it is not possible to use QAT acceleration with SHA512 checksums. Since the sha512 (together with sha256) is implemented in QAT for ZFS in TheTDD's repository for Data Plane API it will be nice to get the possibilities to use it. Can we add new checksum algo in addition to current "proprietary" sha512 such as sha512standard or sha512classic?

sha256 works fine already:

19 1 0x01 14 3808 5295794556 9611438869804
name                            type data
init_failed                     4    26
sha256_requests                 4    15246
sha256_total_in_bytes           4    1171662336
sha256_total_success_bytes      4    1171662336
sha256_total_out_bytes          4    487872
sha256_fails                    4    0
err_timeout                     4    0
err_status_fail                 4    0
err_status_retry                4    0
err_status_param                4    0
err_status_resource             4    0
err_status_restarting           4    0
err_status_unknown              4    0
throughput_sha256_bps           4    476867047

Include any warning/errors/backtraces from the system logs

checksum=on|off|fletcher2|fletcher4|sha256|noparity|sha512|skein|edonr

but

abd_checksum_SHA512_native(abd_t *abd, uint64_t size,
    const void *ctx_template, zio_cksum_t *zcp)
{
    SHA2_CTX    ctx;

    SHA2Init(**SHA512_256**, &ctx);
    (void) abd_iterate_func(abd, 0, size, sha_incremental, &ctx);
    SHA2Final(zcp, &ctx);
}
rincebrain commented 5 years ago

The SHA512 already present here is SHA512/256, which was added to the same standard set as SHA512 in 2012, not some proprietary truncation or format.

What is the problem you're looking to solve, here? The reason SHA512/256 was used AIUI is that the checksum field is only 256b wide, and I don't see any glue to deal with that problem in that branch.

TheTDD commented 5 years ago

@rincebrain thanks, I didn't found it. Then it will be easy to implement support for QAT's SHA512. Can you please tell what part of SHA512 hash is used in ZFS or if there is some cut routine exists in ZFS? That will save me the time of investigations over the source code.

rincebrain commented 5 years ago

@TheTDD That was the first sentence of my post - the name of the relevant standard, and a link to the standard in question.

patrickdk77 commented 5 years ago

There are different flavors of SHA512. The SHA512 your thinking of. but then there are specially truncated versions. Truncation itself is not good, so it works slightly different. Dunno if it will be QAT compatable, depends on how flexable QAT is, and if it was designed with using the alternative SHA512 options.

SHA512/224 and SHA512/256 are both truncated forms, but only truncated in length, they are computed slightly differently, mainly the initialization. They don't just truncate the result from a SHA512.

It's all part of the normal FIPS SHA-2 standards group. Linked to by rincebrain.

I forget what exactly happens, but the equiv bits if you truncate SHA512 will be far less than 256bits of usefulness.

TheTDD commented 5 years ago

Thank you to clarify the issue, it seems that there is actually no chance and no need to use QAT SHA512 acceleration in ZFS until checksum field will be extended to 512 bits. QAT doesn't support SHA512/256 digest algo. Perhaps the SHA3-256 will be the option? QAT supports CPA_CY_SYM_HASH_SHA3_256.

As from Wikipedia, in hardware implementations, SHA-3 is notably faster than all other finalists of NIST competition, and also faster than SHA-2 and SHA-1.

blind-oracle commented 5 years ago

Is this really important? Even the shitty 1.1Ghz Celeron N4200 on one of my routers can do almost 600MB/sec of SHA256 on a single core without any QAT. Although it has something called https://en.wikipedia.org/wiki/Intel_SHA_extensions which is probably used by OpenSSL.

# openssl speed sha256
Doing sha256 for 3s on 16 size blocks: 13068941 sha256's in 2.94s
Doing sha256 for 3s on 64 size blocks: 8795024 sha256's in 2.94s
Doing sha256 for 3s on 256 size blocks: 4435549 sha256's in 2.96s
Doing sha256 for 3s on 1024 size blocks: 1486112 sha256's in 2.96s
Doing sha256 for 3s on 8192 size blocks: 207132 sha256's in 2.98s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256           71123.49k   191456.30k   383615.05k   514114.42k   569404.48k
TheTDD commented 5 years ago

@blind-oracle привет. Any possibilities to offload the calculations from CPU to specialized devices should be welcome. QAT is very nice on parallel computing. It can handle 65536 crypto requests at once. It might be interesting for the user of huge pools or the users of deduplication where fast and safe checksums are extremely important. Anyway the support of QAT-GZIP and QAT-SHA2/256 already implemented in ZFS, so additional fast checksum algo SHA3/256 with QAT in the back will just make ZFS even more attractive. I could start with implementation by myself but afraid it will be too complicated because lack of experience in linux C programming.

patrickdk77 commented 5 years ago

@blind-oracle so really you mean to say, you can do 71 to 191 MB/sec, as your likely not doing 8MB block sizes as the majority of your pool blocks. Hashing metadata blocks is likely going be the real speed killer, but I have no proof of it.

The speed difference on a cpu between sha2-256 and sha3-256 is marginal. Would need to assign a new hash id for SHA3-256 to get it supported in zfs, as it cannot reuse an existing type as it will produce different hash values.

blind-oracle commented 5 years ago

@TheTDD Хай :) Yeah I agree that if it's possible to speed something up, especially more or less cheaply, this should be done. It's just that this stuff is probably of not very high priority when comparing with other more useful features.

@patrickdk77 Yes, of course it's the max speed. But metadata blocks probably do not require that amount of throughput.

rlaager commented 5 years ago

The proposal here is that ZoL should support SHA3/256 plus QAT acceleration of the same, to gain some speed. ZoL already supports SHA2/256 QAT acceleration.

@TheTDD do you have performance comparisons of a QAT card doing SHA2/256 vs SHA3/256 (not in ZFS, just generally the performance of the card)? How does that compare to CPU performance of Skein (which is a dedup-compatible checksum algorithm in ZoL)?

My hunch is that the performance gains from SHA2/256 to SHA3/256 are very minimal and if one cares about speed, they probably should use Skein. But maybe I'm wrong and it's a huge difference. Performance numbers would help.

TheTDD commented 5 years ago

I'd like to do performance tests but suddenly found that my Intel® QuickAssist Adapter 8950 doesn't support SHA3. It is supported by Intel only starting from QuickAssist Adapter 8960/8970. Until I get such adapter in hands I can't help with the performance measurement. @wli5 seems to work in Intel, perhaps he is able to help with measurements?

I know that SHA2-256 is supported by ZFS and QAT, but on 64-bit platforms SHA2-512 is almost 2x faster then SHA2-256. Since the implementation of SHA2-512 in ZFS is just SHA2-512/256 there is no possibility to accelerate it using QAT. Hardware implementations of SHA3 should be faster, therefore I'd like to close the gap between slow SHA2-256 and fast SHA2-512 with SHA3-256.

It is also not correct suggestion that if skein is faster then other digests are not necessary. Skein is calculated on CPU but if CPU is already busy with other stuff then any possibilities to offload the tasks to specialized hardware should be welcome. I'm doing that now at my home server, using QAT compression and SHA2-256 digests on 10TB scrub takes only 1 day. Before using QAT it was about a week. In the meantime CPU is mining roicoins.

rincebrain commented 5 years ago

@wli5 cat.intel.com does not appear to resolve in DNS, and that PDF does not seem to show up anywhere else in quick searching.

wli5 commented 5 years ago

@rincebrain @TheTDD for sha2-256, 8970 provide ~3x throughput than 8950, and sha3-256 has similar perf as sha2-256.

TheTDD commented 5 years ago

@wli5 if the performance of sha3-256 in QAT is not significant better then there is no need to implement sha3-256 in ZFS, you may close/delete this issue. SHA2-512 would be better solution but it is not possible because of length limit of checksum in ZFS.

wli5 commented 5 years ago

@TheTDD depends on what you are comparing with, sha3-256 in 8970 is 3x of sha2-256 in 8950.

rlaager commented 4 years ago

Comparing across different models of QAT cards is not relevant. It sounds like there is no significant performance gain here. If there is, we can reopen.