Open beldmit opened 1 year ago
User time is even worse: 27 mins vs 4
With OQS_USE_SHA3_OPENSSL=ON I see 9 minutes for running tests and without - just 2 min. Fedora 37, OpenSSL 3.0.x
Expected behavior More or less equal time.
Why do you expect this? "OQS_USE_SHA3_OPENSSL=OFF" activates a local XKCP version, so entirely different code from OpenSSL's. What SHA3 code base does OpenSSL use? The same? Then indeed I'd understand your surprise.
We would like to narrow down the problem - is it the performance of low-level primitives or, say, digest fetching from openssl, or smth else.
FWIW, the speed differential is clearly visible even in the built-in artificial speed tests -- but not a factor of 6:
$ ./build/tests/speed_common
Configuration info
==================
Target platform: x86_64-Linux-5.19.0-38-generic
Compiler: gcc (11.3.0)
Compile options: [-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version: 0.8.0-dev
Git commit: d61d81c526da8bb62e363f5a75191689572151cb
OpenSSL enabled: Yes (OpenSSL 3.0.2 15 Mar 2022)
AES: NI
SHA-2: OpenSSL
SHA-3: C
OQS build flags: BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Debug
CPU exts active: ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2023-04-03 17:40:31
Operation | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
OQS_AES128_ECB_load+free_sch | 4495481 | 1.000 | 0.222 | 0.416 | 371 | 26
OQS_AES128_ECB_enc_sch | 4961547 | 1.000 | 0.202 | 0.401 | 294 | 24
OQS_AES128_ECB_enc | 2511140 | 1.000 | 0.398 | 0.490 | 706 | 38
OQS_AES256_ECB_load+free_sch | 5163286 | 1.000 | 0.194 | 0.396 | 317 | 32
OQS_AES256_ECB_enc_sch | 3699662 | 1.000 | 0.270 | 0.445 | 400 | 45
OQS_AES256_ECB_enc | 2341165 | 1.000 | 0.427 | 0.495 | 759 | 46
OQS_AES256_CTR_init+iv+free | 4956465 | 1.000 | 0.202 | 0.402 | 332 | 32
OQS_AES256_CTR_inc_stream_iv | 7567083 | 1.000 | 0.132 | 0.339 | 200 | 19
OQS_AES256_CTR_inc_stream_blks | 7917212 | 1.000 | 0.126 | 0.332 | 189 | 20
OQS_SHA2_sha256 | 3303524 | 1.000 | 0.303 | 0.468 | 524 | 165
OQS_SHA2_sha384 | 2454523 | 1.000 | 0.407 | 0.492 | 723 | 43
OQS_SHA2_sha512 | 2401595 | 1.000 | 0.416 | 0.493 | 739 | 40
OQS_SHA3_sha3_256 | 2675547 | 1.000 | 0.374 | 0.492 | 658 | 188
OQS_SHA3_sha3_384 | 2671477 | 1.000 | 0.374 | 0.492 | 659 | 190
OQS_SHA3_sha3_512 | 2688687 | 1.000 | 0.372 | 0.491 | 654 | 187
OQS_SHA3_shake128 | 2577158 | 1.000 | 0.388 | 0.497 | 685 | 213
OQS_SHA3_shake256 | 2568989 | 1.000 | 0.389 | 0.498 | 687 | 213
Ended at 2023-04-03 17:40:50
$ ./build-openssl/tests/speed_common
Configuration info
==================
Target platform: x86_64-Linux-5.19.0-38-generic
Compiler: gcc (11.3.0)
Compile options: [-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version: 0.8.0-dev
Git commit: d61d81c526da8bb62e363f5a75191689572151cb
OpenSSL enabled: Yes (OpenSSL 3.0.2 15 Mar 2022)
AES: NI
SHA-2: OpenSSL
SHA-3: OpenSSL
OQS build flags: BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Debug
CPU exts active: ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2023-04-03 17:40:58
Operation | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
OQS_AES128_ECB_load+free_sch | 4487752 | 1.000 | 0.223 | 0.419 | 372 | 93
OQS_AES128_ECB_enc_sch | 5086516 | 1.000 | 0.197 | 0.400 | 291 | 83
OQS_AES128_ECB_enc | 2507581 | 1.000 | 0.399 | 0.490 | 707 | 33
OQS_AES256_ECB_load+free_sch | 5149337 | 1.000 | 0.194 | 0.396 | 312 | 27
OQS_AES256_ECB_enc_sch | 3762857 | 1.000 | 0.266 | 0.442 | 389 | 35
OQS_AES256_ECB_enc | 2328186 | 1.000 | 0.430 | 0.499 | 763 | 119
OQS_AES256_CTR_init+iv+free | 4915280 | 1.000 | 0.203 | 0.405 | 334 | 86
OQS_AES256_CTR_inc_stream_iv | 7548342 | 1.000 | 0.132 | 0.339 | 200 | 20
OQS_AES256_CTR_inc_stream_blks | 7887930 | 1.000 | 0.127 | 0.333 | 190 | 31
OQS_SHA2_sha256 | 3296287 | 1.000 | 0.303 | 0.471 | 525 | 192
OQS_SHA2_sha384 | 2417810 | 1.000 | 0.414 | 0.497 | 734 | 125
OQS_SHA2_sha512 | 2415321 | 1.000 | 0.414 | 0.497 | 735 | 126
OQS_SHA3_sha3_256 | 1901105 | 1.000 | 0.526 | 0.505 | 946 | 136
OQS_SHA3_sha3_384 | 1924791 | 1.000 | 0.520 | 0.506 | 934 | 142
OQS_SHA3_sha3_512 | 1916460 | 1.000 | 0.522 | 0.504 | 940 | 129
OQS_SHA3_shake128 | 1842362 | 1.000 | 0.543 | 0.499 | 979 | 50
OQS_SHA3_shake256 | 1882932 | 1.000 | 0.531 | 0.499 | 958 | 44
Ended at 2023-04-03 17:41:17
So you have just a better implementation of SHA3, right? Is it yours implementation? (All my OpenSSL-related hats on) Could it be brought to OpenSSL? (All my OpenSSL-related hats off)
I'm aware of the following antipattern - in 3.0. using old methods like EVP_sha3_256() instead of fetching a digest object via EVP_MD_fetch and using a fetched object gives a performance penalty. See https://github.com/openssl/openssl/pull/20354/files#diff-86fbd44f96c9f78b9c4a27d8b5fb9d83aa787b9a7d92c6774d9172c5952b791f for example. If it is the case (and looks like it is), it's worth fixing
So you have just a better implementation of SHA3, right?
Define "better". "Faster": Probably. "More secure": Not sure.
Is it yours implementation?
Nope: This is basically https://github.com/XKCP/XKCP, right @jschanck ?
Sure, I mean "faster". I still think that the antipattern I mentioned is worth fixing, both for SHA3 and SHA2.
OpenSSL doesn't expose a streaming API for XOF output, which several PQ schemes require. The slowdown you're seeing is due to the XXX
in our code here: https://github.com/open-quantum-safe/liboqs/blob/main/src/common/sha3/ossl_sha3.c#L146.
I agree, but it's a different problem.
it's a different problem.
Agreed -- but it's a rather significant one. There are several things, then:
Someone needs to "re-activate" the 2018 OpenSSL issue https://github.com/openssl/openssl/issues/7894 then, right @beldmit ? My hunch is it explains a larger part of the speed differential (could be validated by using OpenSSL111, right?).
For the OpenSSL3 liboqs
integration, just created https://github.com/open-quantum-safe/liboqs/issues/1427.
I'd thus suggest to close this issue then (and work on the two above).
I'm OK to reactivate https://github.com/openssl/openssl/issues/7894, though it shouldn't be a new API (if possible). How do we reactivate it?
How do we reactivate it?
See https://github.com/openssl/openssl/pull/7921#issuecomment-1496956969.
@dstebila Can you remind me about the reason for closing this issue? When doing some spot checks before a possible 0.11.0 I noticed the problem still exists: Activating OpenSSL SHA3 slows down liboqs
substantially (running "non-streaming" openssl v3.0.2
for example). Or is the "resolution" the documentation that liboqs
prioritizes performance over security?
These default choices have been made to optimize the default performance of all algorithms. Changing them implies performance penalties.
@beldmit How are you dealing with this these days?
We use OpenSSL implementation for potential FIPS's sake. But OpenSSL started proposing XOF API since then, I think that may give better performance.
On the other hand I doubt that correctly written digest code may be of any harm and it would be great to bring the faster code to OpenSSL, if possible
After https://github.com/open-quantum-safe/liboqs/pull/1694, we have the XOF API integrated into liboqs, conditional on a sufficiently high OpenSSL version. It would be nice to run some comparisons and see how this impacts OpenSSL SHA3 performance in liboqs. I'll take a look later this week if time allows (of course, don't let that stop anybody else from picking it up in the meantime).
Describe the bug I build liboqs-0.8.0-dev with and without OQS_USE_SHA3_OPENSSL=ON. With OQS_USE_SHA3_OPENSSL=ON I see 9 minutes for running tests and without - just 2 min. Fedora 37, OpenSSL 3.0.x
To Reproduce cmake -GNinja -DBUILD_SHARED_LIBS=ON -DOQS_ALGS_ENABLED=STD -DOQS_USE_SHA3_OPENSSL=ON -DCMAKE_BUILD_TYPE=Debug -DOQS_USE_AVX2_INSTRUCTIONS=OFF -DOQS_USE_AVX512_INSTRUCTIONS=OFF
make time make test
Repeat the same without -DOQS_USE_SHA3_OPENSSL=ON
(I do it as a part of RPM build, so pure difference should be in the testing)
Expected behavior More or less equal time.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
Additional context We want to avoid multiple implementations of low-level algorithms (AES, SHA2, SHA3) in Fedora and want to heavily rely on OpenSSL implementation. We would like to narrow down the problem - is it the performance of low-level primitives or, say, digest fetching from openssl, or smth else.