Closed mouse07410 closed 1 year ago
I hope you can point me at where/what in Python environment seems missing - maybe an older version of some module, or such...?
I don't see any reference to python module issues in the above...Why do you think this is the issue? If it's "only" exactly the 4 errors above with algorithms of the SPHINCS+ family (is it?) this reminds me a bit of https://github.com/open-quantum-safe/liboqs/pull/1420... If it's indeed only these Sphincs+ tests failing, would you be willing to re-build&test with "-DOQS_USE_SHA2_OPENSSL=OFF" to see whether this improves things (so it may be the same issue as in open-quantum-safe/liboqs#1420)?
Just found a definite bug; until the fix for that lands, DOQS_USE_SHA2_OPENSSL=OFF
would be a workaround.
Just found a definite bug; until the fix for that lands,
DOQS_USE_SHA2_OPENSSL=OFF
would be a workaround.
Sorry, proposed workaround is no-joy:
One excerpt:
[2/14] Linking C executable tests/test_sha3
FAILED: tests/test_sha3
: && /opt/local/bin/clang -g -O0 -std=gnu18 -march=native -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -g -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600_avx2.dir/KeccakP-1600/avx2/KeccakP-1600-AVX2.S.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600times4_avx2.dir/KeccakP-1600times4/avx2/KeccakP-1600-times4-SIMD256.c.o src/common/CMakeFiles/common.dir/aes/aes.c.o src/common/CMakeFiles/common.dir/aes/aes_c.c.o src/common/CMakeFiles/common.dir/aes/aes128_ni.c.o src/common/CMakeFiles/common.dir/aes/aes256_ni.c.o src/common/CMakeFiles/common.dir/sha2/sha2.c.o src/common/CMakeFiles/common.dir/sha2/sha2_c.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3x4.c.o src/common/CMakeFiles/common.dir/common.c.o src/common/CMakeFiles/common.dir/pqclean_shims/nistseedexpander.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202x4.c.o src/common/CMakeFiles/common.dir/rand/rand.c.o src/common/CMakeFiles/common.dir/rand/rand_nist.c.o tests/CMakeFiles/test_sha3.dir/test_sha3.c.o -o tests/test_sha3 /Users/ur20980/openssl-3/lib/libcrypto.dylib -lm && :
Undefined symbols for architecture x86_64:
"_oqs_aes_256_ecb", referenced from:
_AES256_ECB in rand_nist.c.o
"_oqs_fetch_ossl_objects", referenced from:
_OQS_init in common.c.o
"_oqs_free_ossl_objects", referenced from:
_OQS_destroy in common.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Without the workaround, at least the test-run completes, though the majority of the tests fail:
=========== 381 failed, 77 passed, 220 skipped in 101.68s (0:01:41) ============
With that presumed workaround - tests don't even run, as shown above, and here:
. . .
FAILED: tests/test_aes
: && /opt/local/bin/clang -g -O3 -std=gnu18 -march=native -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -O3 -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600_plain64.dir/KeccakP-1600/plain-64bits/KeccakP-1600-opt64.c.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600times4_serial.dir/KeccakP-1600times4/serial/KeccakP-1600-times4-on1.c.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600_avx2.dir/KeccakP-1600/avx2/KeccakP-1600-AVX2.S.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600times4_avx2.dir/KeccakP-1600times4/avx2/KeccakP-1600-times4-SIMD256.c.o src/common/CMakeFiles/common.dir/aes/aes.c.o src/common/CMakeFiles/common.dir/aes/aes_c.c.o src/common/CMakeFiles/common.dir/aes/aes128_ni.c.o src/common/CMakeFiles/common.dir/aes/aes256_ni.c.o src/common/CMakeFiles/common.dir/sha2/sha2.c.o src/common/CMakeFiles/common.dir/sha2/sha2_c.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3x4.c.o src/common/CMakeFiles/common.dir/common.c.o src/common/CMakeFiles/common.dir/pqclean_shims/nistseedexpander.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202x4.c.o src/common/CMakeFiles/common.dir/rand/rand.c.o src/common/CMakeFiles/common.dir/rand/rand_nist.c.o tests/CMakeFiles/test_aes.dir/test_aes.c.o -o tests/test_aes /Users/ur20980/openssl-3/lib/libcrypto.dylib -lm && :
Undefined symbols for architecture x86_64:
"_oqs_aes_256_ecb", referenced from:
_AES256_ECB in rand_nist.c.o
"_oqs_fetch_ossl_objects", referenced from:
_OQS_init in common.c.o
"_oqs_free_ossl_objects", referenced from:
_OQS_destroy in common.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[1701/1716] Building C object src/sig/falcon/CMakeFil...con_1024_avx2.dir/pqclean_falcon-1024_avx2/keygen.c.o
ninja: build stopped: subcommand failed.
Correct workaround is -DOQS_USE_OPENSSL=OFF
:
== 458 passed, 220 skipped in 68.31s (0:01:08) ==
Looking forward to seeing the bug fix merged. TNX
Correct workaround is -DOQS_USE_OPENSSL=OFF
Thanks for the test/reconfirmation. We'll look into this.
It works on Apple Silicon. I'll need to check on x86_64.
But on Apple Silicon, OpenSSL-3.2.0-dev crashes several times, with this:
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [64110]
VM Region Info: 0 is not in any region. Bytes before following region: 4341940224
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
__TEXT 102ccc000-102cd0000 [ 16K] r-x/r-x SM=COW .../MacOS/Python
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_pthread.dylib 0x1a1c948c4 pthread_rwlock_wrlock + 0
1 libcrypto.3.dylib 0x104692dd4 CRYPTO_THREAD_write_lock + 12 (threads_pthread.c:110)
2 libcrypto.3.dylib 0x10463ac14 ERR_unload_strings + 92 (err.c:314)
3 libcrypto.3.dylib 0x10468f720 ossl_provider_free + 92 (provider_core.c:688)
4 libcrypto.3.dylib 0x1046f5138 OPENSSL_sk_pop_free + 76 (stack.c:439)
5 libcrypto.3.dylib 0x10468f104 sk_OSSL_PROVIDER_pop_free + 12 (provider_core.c:199) [inlined]
6 libcrypto.3.dylib 0x10468f104 ossl_provider_store_free + 76 (provider_core.c:295)
7 libcrypto.3.dylib 0x10467fd6c context_deinit_objs + 124 (context.c:250)
8 libcrypto.3.dylib 0x10467f564 context_deinit + 16 (context.c:334) [inlined]
9 libcrypto.3.dylib 0x10467f564 OSSL_LIB_CTX_free + 132 (context.c:465)
10 oqsprovider.0.5.0-dev.dylib 0x10404b560 oqsx_freeprovctx + 24 (oqsprov_keys.c:178)
11 oqsprovider.0.5.0-dev.dylib 0x10404a9a0 oqsprovider_teardown + 12 (oqsprov.c:553)
12 libcrypto.3.dylib 0x103e17400 ossl_provider_free + 76
13 libcrypto.3.dylib 0x103e6b5a0 OPENSSL_sk_pop_free + 60
14 libcrypto.3.dylib 0x103e16ec0 ossl_provider_store_free + 72
15 libcrypto.3.dylib 0x103e0b5c4 context_deinit_objs + 124
16 libcrypto.3.dylib 0x103e0ae5c context_deinit + 32
17 libcrypto.3.dylib 0x103e0ae2c ossl_lib_ctx_default_deinit + 20
18 libcrypto.3.dylib 0x103e0dd80 OPENSSL_cleanup + 204
19 libsystem_c.dylib 0x1a1b55ed4 __cxa_finalize_ranges + 492
20 libsystem_c.dylib 0x1a1b55c4c exit + 44
21 libdyld.dylib 0x1a1cb0554 dyld4::LibSystemHelpers::exit(int) const + 20
22 dyld 0x1a193ff7c start + 2320
But on Apple Silicon, OpenSSL-3.2.0-dev crashes several times, with this:
So we re-open. Is this dependent on other providers being active? Any way you see how we can reproduce (also for CI)?
Edit/Add: Is this issue really correct here? Should it rather be an oqsprovider
issue? I propose we make this (issue location) dependent on how the behaviour can get reproduced: OK for you, @mouse07410 ? FWIW, I have a similar issue tracked in https://github.com/open-quantum-safe/oqs-provider/issues/159...
On Apple Silicon.
With minimal set of providers enabled:
=== 464 passed, 220 skipped in 40.24s ===
and no crash.
With pkcs11-provider
or oqs-provider
enabled - same thing as above.
With both pkcs11-provider
and oqs-provider
enabled - tests still pass == 464 passed, 220 skipped in 40.80s ==
, but crashes in the end:
Segmentation fault
FAILED: tests/CMakeFiles/run_tests /Users/ur20980/src/liboqs/build/tests/CMakeFiles/run_tests
cd /Users/ur20980/src/liboqs && /opt/local/bin/cmake -E env OQS_BUILD_DIR=/Users/ur20980/src/liboqs/build python3 -m pytest --verbose --numprocesses=auto --ignore=scripts/copy_from_upstream/repos
ninja: build stopped: subcommand failed.
and crash report:
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [18302]
VM Region Info: 0 is not in any region. Bytes before following region: 4310319104
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
__TEXT 100ea4000-100ea8000 [ 16K] r-x/r-x SM=COW .../MacOS/Python
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_pthread.dylib 0x1a1c948c4 pthread_rwlock_wrlock + 0
1 libcrypto.3.dylib 0x10564add4 CRYPTO_THREAD_write_lock + 12 (threads_pthread.c:110)
2 libcrypto.3.dylib 0x1055f2c14 ERR_unload_strings + 92 (err.c:314)
3 libcrypto.3.dylib 0x105647720 ossl_provider_free + 92 (provider_core.c:688)
4 libcrypto.3.dylib 0x1056ad138 OPENSSL_sk_pop_free + 76 (stack.c:439)
5 libcrypto.3.dylib 0x105647104 sk_OSSL_PROVIDER_pop_free + 12 (provider_core.c:199) [inlined]
6 libcrypto.3.dylib 0x105647104 ossl_provider_store_free + 76 (provider_core.c:295)
7 libcrypto.3.dylib 0x105637d6c context_deinit_objs + 124 (context.c:250)
8 libcrypto.3.dylib 0x105637564 context_deinit + 16 (context.c:334) [inlined]
9 libcrypto.3.dylib 0x105637564 OSSL_LIB_CTX_free + 132 (context.c:465)
10 oqsprovider.0.5.0-dev.dylib 0x10436f560 oqsx_freeprovctx + 24 (oqsprov_keys.c:178)
11 oqsprovider.0.5.0-dev.dylib 0x10436e9a0 oqsprovider_teardown + 12 (oqsprov.c:553)
12 libcrypto.3.dylib 0x104547400 ossl_provider_free + 76
13 libcrypto.3.dylib 0x10459b5a0 OPENSSL_sk_pop_free + 60
14 libcrypto.3.dylib 0x104546ec0 ossl_provider_store_free + 72
15 libcrypto.3.dylib 0x10453b5c4 context_deinit_objs + 124
16 libcrypto.3.dylib 0x10453ae5c context_deinit + 32
17 libcrypto.3.dylib 0x10453ae2c ossl_lib_ctx_default_deinit + 20
18 libcrypto.3.dylib 0x10453dd80 OPENSSL_cleanup + 204
19 libsystem_c.dylib 0x1a1b55ed4 __cxa_finalize_ranges + 492
20 libsystem_c.dylib 0x1a1b55c4c exit + 44
21 libdyld.dylib 0x1a1cb0554 dyld4::LibSystemHelpers::exit(int) const + 20
22 dyld 0x1a193ff7c start + 2320
@mouse07410 Can you still reproduce this issue after the latest oqsprovider & libops updates?
@mouse07410 Can you still reproduce this issue after the latest oqsprovider & libops updates?
Unfortunately, yes I can. Tests proceed fine, but when they finish I see about 5 or so OS crash reports telling me that it crashed.
Unfortunately, yes I can. Tests proceed fine, but when they finish I see about 5 or so OS crash reports telling me that it crashed.
Too bad. On Ubuntu, everything works like a charm (see below for OpenSSL3.0.2; same results for 3.2.0-dev): Tests pass and no crash at the end, so I cannot debug into this... Tagging @mattcaswell for thoughts as to whether this may be another openssl (config) issue (?)
Test setup:
LD_LIBRARY_PATH=/home/mib/git/oqs/oqs-provider/.local/lib64
OPENSSL_APP=openssl
OPENSSL_CONF=/home/mib/git/oqs/oqs-provider/scripts/openssl-ca.cnf
OPENSSL_MODULES=/home/mib/git/oqs/oqs-provider/_build/lib
Version information:
OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)
Providers:
default
name: OpenSSL Default Provider
version: 3.0.2
status: active
build info: 3.0.2
gettable provider parameters:
name: pointer to a UTF8 encoded string (arbitrary size)
version: pointer to a UTF8 encoded string (arbitrary size)
buildinfo: pointer to a UTF8 encoded string (arbitrary size)
status: integer (arbitrary size)
oqsprovider
name: OpenSSL OQS Provider
version: 0.5.0-dev
status: active
build info: OQS Provider v.0.5.0-dev (8481fcf) based on liboqs v.0.8.0-dev
gettable provider parameters:
name: pointer to a UTF8 encoded string (arbitrary size)
version: pointer to a UTF8 encoded string (arbitrary size)
buildinfo: pointer to a UTF8 encoded string (arbitrary size)
status: integer (arbitrary size)
oqsprovider2
name: OpenSSL OQS Provider
version: 0.5.0-dev
status: active
build info: OQS Provider v.0.5.0-dev (8481fcf) based on liboqs v.0.8.0-dev
gettable provider parameters:
name: pointer to a UTF8 encoded string (arbitrary size)
version: pointer to a UTF8 encoded string (arbitrary size)
buildinfo: pointer to a UTF8 encoded string (arbitrary size)
status: integer (arbitrary size)
pkcs11
name: PKCS#11 Provider
version: 3.2.0
status: active
build info: 3.2.0-dev
gettable provider parameters:
name: pointer to a UTF8 encoded string (arbitrary size)
version: pointer to a UTF8 encoded string (arbitrary size)
buildinfo: pointer to a UTF8 encoded string (arbitrary size)
status: integer (arbitrary size)
Cert gen/verify, CMS sign/verify, CA tests for all enabled algorithms commencing...
.......................
Test project /home/mib/git/oqs/oqs-provider/_build
Start 1: oqs_signatures
1/5 Test #1: oqs_signatures ................... Passed 5.51 sec
Start 2: oqs_kems
2/5 Test #2: oqs_kems ......................... Passed 0.15 sec
Start 3: oqs_groups
3/5 Test #3: oqs_groups ....................... Passed 0.32 sec
Start 4: oqs_tlssig
4/5 Test #4: oqs_tlssig ....................... Passed 0.00 sec
Start 5: oqs_endecode
5/5 Test #5: oqs_endecode ..................... Passed 13.91 sec
100% tests passed, 0 tests failed out of 5
Total Test time (real) = 19.89 sec
All oqsprovider tests passed.
This is crashing during a call to OPENSSL_cleanup
which occurs automatically on process exit. This is why the tests appear to pass and then still crash, i.e. all the work has been done successfully and it crashes while the process is exiting.
The stack trace shows us this:
0 libsystem_pthread.dylib 0x1a1c948c4 pthread_rwlock_wrlock + 0
1 libcrypto.3.dylib 0x10564add4 CRYPTO_THREAD_write_lock + 12 (threads_pthread.c:110)
2 libcrypto.3.dylib 0x1055f2c14 ERR_unload_strings + 92 (err.c:314)
I assume the "0 is not in any region" message in the stack trace means that the write lock that ERR_unload_strings
is trying to obtain is NULL. ERR_unload_strings
looks like this:
int ERR_unload_strings(int lib, ERR_STRING_DATA *str)
{
#ifndef OPENSSL_NO_ERR
if (!RUN_ONCE(&err_string_init, do_err_strings_init))
return 0;
if (!CRYPTO_THREAD_write_lock(err_string_lock))
return 0;
With the err_string_lock
being initialised via a RUN_ONCE in do_err_strings_init
:
DEFINE_RUN_ONCE_STATIC(do_err_strings_init)
{
if (!OPENSSL_init_crypto(OPENSSL_INIT_BASE_ONLY, NULL))
return 0;
err_string_lock = CRYPTO_THREAD_lock_new();
if (err_string_lock == NULL)
return 0;
Since ERR_unload_strings
calls the do_err_strings_init
function via RUN_ONCE
we can assume that the err_string_lock
must have once been initialised - but for some reason no longer is. The only way I can see that happening is if err_cleanup
has already been called:
void err_cleanup(void)
{
if (set_err_thread_local != 0)
CRYPTO_THREAD_cleanup_local(&err_thread_local);
CRYPTO_THREAD_lock_free(err_string_lock);
err_string_lock = NULL;
#ifndef OPENSSL_NO_ERR
lh_ERR_STRING_DATA_free(int_error_hash);
int_error_hash = NULL;
#endif
}
But that is only ever called from OPENSSL_cleanup
(which according to the stack trace is what we are currently doing):
17 libcrypto.3.dylib 0x10453ae2c ossl_lib_ctx_default_deinit + 20
18 libcrypto.3.dylib 0x10453dd80 OPENSSL_cleanup + 204
So, we are currently in the ossl_lib_ctx_default_deinit
step off OPENSSL_cleanup
....but that occurs before any call to err_cleanup
:
OSSL_TRACE(INIT, "OPENSSL_cleanup: ossl_lib_ctx_default_deinit()\n");
ossl_lib_ctx_default_deinit();
ossl_cleanup_thread();
OSSL_TRACE(INIT, "OPENSSL_cleanup: bio_cleanup()\n");
bio_cleanup();
OSSL_TRACE(INIT, "OPENSSL_cleanup: evp_cleanup_int()\n");
evp_cleanup_int();
OSSL_TRACE(INIT, "OPENSSL_cleanup: ossl_obj_cleanup_int()\n");
ossl_obj_cleanup_int();
OSSL_TRACE(INIT, "OPENSSL_cleanup: err_int()\n");
err_cleanup();
So, right now I don't see a situation how this could occur!?
In any case this doesn't currently look like an oqs-provider issue. More likely it is an OpenSSL problem. Perhaps open an issue there?
In any case this doesn't currently look like an oqs-provider issue. More likely it is an OpenSSL problem. Perhaps open an issue there?
Opened an OpenSSL issue for this. Also, it affects both Apple Silicon and x86_64 platforms.
As I just chased another OpenSSL (only in 3.0.1 -- triggered by old CI packaging) bug for a few hours now, I tend to also consider this issue moot for oqsprovider
. We may leave it open for the fun of it, but it won't stop me from finally doing a release for oqsprovider
. Objections, anyone?
We may leave it open for the fun of it, but it won't stop me from finally doing a release for oqsprovider. Objections, anyone?
No objections - let's release.
As far as I can see, current master runs and passes tests on my MacOS Ventura, both Intel and M2 platforms. Using OpenSSL-3.1.1 and 3.2.0-dev.
Thus, let me close this for now - might re-open if hit with the same problem again. ;-) Thanks!
As far as I can see, current master runs and passes tests on my MacOS Ventura, both Intel and M2 platforms. Using OpenSSL-3.1.1 and 3.2.0-dev.
Thanks for these tests on what is now a release: Glad to hear that!
Describe the bug On a newly-prepared platform , I'm getting a ton of errors trying to run test-suite:
To Reproduce Steps to reproduce the behavior:
build
subdirectory.ninja run_tests
Expected behavior Tests passing or being skipped, test-scripts not crashing.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
Additional context I hope you can point me at where/what in Python environment seems missing - maybe an older version of some module, or such...?