open-quantum-safe / liboqs

C library for prototyping and experimenting with quantum-resistant cryptography
https://openquantumsafe.org/
Other
1.77k stars 435 forks source link

OQS_USE_SHA3_OPENSSL=ON makes running tests significantly slower #1426

Open beldmit opened 1 year ago

beldmit commented 1 year ago

Describe the bug I build liboqs-0.8.0-dev with and without OQS_USE_SHA3_OPENSSL=ON. With OQS_USE_SHA3_OPENSSL=ON I see 9 minutes for running tests and without - just 2 min. Fedora 37, OpenSSL 3.0.x

To Reproduce cmake -GNinja -DBUILD_SHARED_LIBS=ON -DOQS_ALGS_ENABLED=STD -DOQS_USE_SHA3_OPENSSL=ON -DCMAKE_BUILD_TYPE=Debug -DOQS_USE_AVX2_INSTRUCTIONS=OFF -DOQS_USE_AVX512_INSTRUCTIONS=OFF

make time make test

Repeat the same without -DOQS_USE_SHA3_OPENSSL=ON

(I do it as a part of RPM build, so pure difference should be in the testing)

Expected behavior More or less equal time.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context We want to avoid multiple implementations of low-level algorithms (AES, SHA2, SHA3) in Fedora and want to heavily rely on OpenSSL implementation. We would like to narrow down the problem - is it the performance of low-level primitives or, say, digest fetching from openssl, or smth else.

beldmit commented 1 year ago

User time is even worse: 27 mins vs 4

baentsch commented 1 year ago

With OQS_USE_SHA3_OPENSSL=ON I see 9 minutes for running tests and without - just 2 min. Fedora 37, OpenSSL 3.0.x

Expected behavior More or less equal time.

Why do you expect this? "OQS_USE_SHA3_OPENSSL=OFF" activates a local XKCP version, so entirely different code from OpenSSL's. What SHA3 code base does OpenSSL use? The same? Then indeed I'd understand your surprise.

We would like to narrow down the problem - is it the performance of low-level primitives or, say, digest fetching from openssl, or smth else.

FWIW, the speed differential is clearly visible even in the built-in artificial speed tests -- but not a factor of 6:

$ ./build/tests/speed_common 
Configuration info
==================
Target platform:  x86_64-Linux-5.19.0-38-generic
Compiler:         gcc (11.3.0)
Compile options:  [-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version:      0.8.0-dev
Git commit:       d61d81c526da8bb62e363f5a75191689572151cb
OpenSSL enabled:  Yes (OpenSSL 3.0.2 15 Mar 2022)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Debug 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2023-04-03 17:40:31
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
OQS_AES128_ECB_load+free_sch         |    4495481 |          1.000 |           0.222 |      0.416 |                       371 |         26
OQS_AES128_ECB_enc_sch               |    4961547 |          1.000 |           0.202 |      0.401 |                       294 |         24
OQS_AES128_ECB_enc                   |    2511140 |          1.000 |           0.398 |      0.490 |                       706 |         38
OQS_AES256_ECB_load+free_sch         |    5163286 |          1.000 |           0.194 |      0.396 |                       317 |         32
OQS_AES256_ECB_enc_sch               |    3699662 |          1.000 |           0.270 |      0.445 |                       400 |         45
OQS_AES256_ECB_enc                   |    2341165 |          1.000 |           0.427 |      0.495 |                       759 |         46
OQS_AES256_CTR_init+iv+free          |    4956465 |          1.000 |           0.202 |      0.402 |                       332 |         32
OQS_AES256_CTR_inc_stream_iv         |    7567083 |          1.000 |           0.132 |      0.339 |                       200 |         19
OQS_AES256_CTR_inc_stream_blks       |    7917212 |          1.000 |           0.126 |      0.332 |                       189 |         20
OQS_SHA2_sha256                      |    3303524 |          1.000 |           0.303 |      0.468 |                       524 |        165
OQS_SHA2_sha384                      |    2454523 |          1.000 |           0.407 |      0.492 |                       723 |         43
OQS_SHA2_sha512                      |    2401595 |          1.000 |           0.416 |      0.493 |                       739 |         40
OQS_SHA3_sha3_256                    |    2675547 |          1.000 |           0.374 |      0.492 |                       658 |        188
OQS_SHA3_sha3_384                    |    2671477 |          1.000 |           0.374 |      0.492 |                       659 |        190
OQS_SHA3_sha3_512                    |    2688687 |          1.000 |           0.372 |      0.491 |                       654 |        187
OQS_SHA3_shake128                    |    2577158 |          1.000 |           0.388 |      0.497 |                       685 |        213
OQS_SHA3_shake256                    |    2568989 |          1.000 |           0.389 |      0.498 |                       687 |        213
Ended at 2023-04-03 17:40:50

$ ./build-openssl/tests/speed_common 
Configuration info
==================
Target platform:  x86_64-Linux-5.19.0-38-generic
Compiler:         gcc (11.3.0)
Compile options:  [-Wa,--noexecstack;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version:      0.8.0-dev
Git commit:       d61d81c526da8bb62e363f5a75191689572151cb
OpenSSL enabled:  Yes (OpenSSL 3.0.2 15 Mar 2022)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            OpenSSL
OQS build flags:  BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Debug 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2023-04-03 17:40:58
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
OQS_AES128_ECB_load+free_sch         |    4487752 |          1.000 |           0.223 |      0.419 |                       372 |         93
OQS_AES128_ECB_enc_sch               |    5086516 |          1.000 |           0.197 |      0.400 |                       291 |         83
OQS_AES128_ECB_enc                   |    2507581 |          1.000 |           0.399 |      0.490 |                       707 |         33
OQS_AES256_ECB_load+free_sch         |    5149337 |          1.000 |           0.194 |      0.396 |                       312 |         27
OQS_AES256_ECB_enc_sch               |    3762857 |          1.000 |           0.266 |      0.442 |                       389 |         35
OQS_AES256_ECB_enc                   |    2328186 |          1.000 |           0.430 |      0.499 |                       763 |        119
OQS_AES256_CTR_init+iv+free          |    4915280 |          1.000 |           0.203 |      0.405 |                       334 |         86
OQS_AES256_CTR_inc_stream_iv         |    7548342 |          1.000 |           0.132 |      0.339 |                       200 |         20
OQS_AES256_CTR_inc_stream_blks       |    7887930 |          1.000 |           0.127 |      0.333 |                       190 |         31
OQS_SHA2_sha256                      |    3296287 |          1.000 |           0.303 |      0.471 |                       525 |        192
OQS_SHA2_sha384                      |    2417810 |          1.000 |           0.414 |      0.497 |                       734 |        125
OQS_SHA2_sha512                      |    2415321 |          1.000 |           0.414 |      0.497 |                       735 |        126
OQS_SHA3_sha3_256                    |    1901105 |          1.000 |           0.526 |      0.505 |                       946 |        136
OQS_SHA3_sha3_384                    |    1924791 |          1.000 |           0.520 |      0.506 |                       934 |        142
OQS_SHA3_sha3_512                    |    1916460 |          1.000 |           0.522 |      0.504 |                       940 |        129
OQS_SHA3_shake128                    |    1842362 |          1.000 |           0.543 |      0.499 |                       979 |         50
OQS_SHA3_shake256                    |    1882932 |          1.000 |           0.531 |      0.499 |                       958 |         44
Ended at 2023-04-03 17:41:17
beldmit commented 1 year ago

So you have just a better implementation of SHA3, right? Is it yours implementation? (All my OpenSSL-related hats on) Could it be brought to OpenSSL? (All my OpenSSL-related hats off)

I'm aware of the following antipattern - in 3.0. using old methods like EVP_sha3_256() instead of fetching a digest object via EVP_MD_fetch and using a fetched object gives a performance penalty. See https://github.com/openssl/openssl/pull/20354/files#diff-86fbd44f96c9f78b9c4a27d8b5fb9d83aa787b9a7d92c6774d9172c5952b791f for example. If it is the case (and looks like it is), it's worth fixing

baentsch commented 1 year ago

So you have just a better implementation of SHA3, right?

Define "better". "Faster": Probably. "More secure": Not sure.

Is it yours implementation?

Nope: This is basically https://github.com/XKCP/XKCP, right @jschanck ?

beldmit commented 1 year ago

Sure, I mean "faster". I still think that the antipattern I mentioned is worth fixing, both for SHA3 and SHA2.

jschanck commented 1 year ago

OpenSSL doesn't expose a streaming API for XOF output, which several PQ schemes require. The slowdown you're seeing is due to the XXX in our code here: https://github.com/open-quantum-safe/liboqs/blob/main/src/common/sha3/ossl_sha3.c#L146.

beldmit commented 1 year ago

I agree, but it's a different problem.

baentsch commented 1 year ago

it's a different problem.

Agreed -- but it's a rather significant one. There are several things, then:

Someone needs to "re-activate" the 2018 OpenSSL issue https://github.com/openssl/openssl/issues/7894 then, right @beldmit ? My hunch is it explains a larger part of the speed differential (could be validated by using OpenSSL111, right?).

For the OpenSSL3 liboqs integration, just created https://github.com/open-quantum-safe/liboqs/issues/1427.

I'd thus suggest to close this issue then (and work on the two above).

beldmit commented 1 year ago

I'm OK to reactivate https://github.com/openssl/openssl/issues/7894, though it shouldn't be a new API (if possible). How do we reactivate it?

baentsch commented 1 year ago

How do we reactivate it?

See https://github.com/openssl/openssl/pull/7921#issuecomment-1496956969.

baentsch commented 1 month ago

@dstebila Can you remind me about the reason for closing this issue? When doing some spot checks before a possible 0.11.0 I noticed the problem still exists: Activating OpenSSL SHA3 slows down liboqs substantially (running "non-streaming" openssl v3.0.2 for example). Or is the "resolution" the documentation that liboqs prioritizes performance over security?

These default choices have been made to optimize the default performance of all algorithms. Changing them implies performance penalties.

@beldmit How are you dealing with this these days?

beldmit commented 1 month ago

We use OpenSSL implementation for potential FIPS's sake. But OpenSSL started proposing XOF API since then, I think that may give better performance.

On the other hand I doubt that correctly written digest code may be of any harm and it would be great to bring the faster code to OpenSSL, if possible

SWilson4 commented 1 month ago

After https://github.com/open-quantum-safe/liboqs/pull/1694, we have the XOF API integrated into liboqs, conditional on a sufficiently high OpenSSL version. It would be nice to run some comparisons and see how this impacts OpenSSL SHA3 performance in liboqs. I'll take a look later this week if time allows (of course, don't let that stop anybody else from picking it up in the meantime).