open-quantum-safe / oqs-provider

OpenSSL 3 provider containing post-quantum algorithms
https://openquantumsafe.org
MIT License
193 stars 80 forks source link

Bug when stopping provider on OSX #168

Closed mouse07410 closed 1 year ago

mouse07410 commented 1 year ago

Describe the bug On a newly-prepared platform , I'm getting a ton of errors trying to run test-suite:

.  .  .
tests/test_speed.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = ['/Users/ur20980/src/liboqs/build/tests/speed_sig', 'SPHINCS+-SHAKE256-256s-robust', '-f']
working_dir = '.'
env = {'ANT_OPTS': '-Dhttp.proxyHost=llproxy.llan.ll.mit.edu -Dhttp.proxyPort=8080', 'AS': 'yasm -m amd64 -f macho64 -p gas -X gnu ', 'AS_INTEGRATED_ASSEMBLER': '1', 'BASH_SILENCE_DEPRECATION_WARNING': '1', ...}
expected_returncode = 0, input = None, ignore_returncode = False

    def run_subprocess(command, working_dir='.', env=None, expected_returncode=0, input=None, ignore_returncode=False):
        """
        Helper function to run a shell command and report success/failure
        depending on the exit status of the shell command.
        """
        env_ = os.environ.copy()
        if env is not None:
            env_.update(env)
        env = env_

        # Note we need to capture stdout/stderr from the subprocess,
        # then print it, which pytest will then capture and
        # buffer appropriately
        print(working_dir + " > " + " ".join(command))

        result = subprocess.run(
                command,
                input=input,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                cwd=working_dir,
                env=env,
            )

        if not(ignore_returncode) and (result.returncode != expected_returncode):
            print(result.stdout.decode('utf-8'))
>           assert False, "Got unexpected return code {}".format(result.returncode)
E           AssertionError: Got unexpected return code -4

tests/helpers.py:42: AssertionError
-------------------------------------------- Captured stdout call --------------------------------------------
. > /Users/ur20980/src/liboqs/build/tests/speed_sig SPHINCS+-SHAKE256-256s-robust -f

__________________________________ test_sig[SPHINCS+-SHAKE256-256f-simple] ___________________________________
[gw11] darwin -- Python 3.11.3 /opt/local/bin/python3

sig_name = 'SPHINCS+-SHAKE256-256f-simple'

    @helpers.filtered_test
    @pytest.mark.parametrize('sig_name', helpers.available_sigs_by_name())
    def test_sig(sig_name):
        kats = helpers.get_kats("sig")
        if not(helpers.is_sig_enabled_by_name(sig_name)): pytest.skip('Not enabled')
>       helpers.run_subprocess( [helpers.path_to_executable('speed_sig'), sig_name, "-f"])

tests/test_speed.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = ['/Users/ur20980/src/liboqs/build/tests/speed_sig', 'SPHINCS+-SHAKE256-256f-simple', '-f']
working_dir = '.'
env = {'ANT_OPTS': '-Dhttp.proxyHost=llproxy.llan.ll.mit.edu -Dhttp.proxyPort=8080', 'AS': 'yasm -m amd64 -f macho64 -p gas -X gnu ', 'AS_INTEGRATED_ASSEMBLER': '1', 'BASH_SILENCE_DEPRECATION_WARNING': '1', ...}
expected_returncode = 0, input = None, ignore_returncode = False

    def run_subprocess(command, working_dir='.', env=None, expected_returncode=0, input=None, ignore_returncode=False):
        """
        Helper function to run a shell command and report success/failure
        depending on the exit status of the shell command.
        """
        env_ = os.environ.copy()
        if env is not None:
            env_.update(env)
        env = env_

        # Note we need to capture stdout/stderr from the subprocess,
        # then print it, which pytest will then capture and
        # buffer appropriately
        print(working_dir + " > " + " ".join(command))

        result = subprocess.run(
                command,
                input=input,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                cwd=working_dir,
                env=env,
            )

        if not(ignore_returncode) and (result.returncode != expected_returncode):
            print(result.stdout.decode('utf-8'))
>           assert False, "Got unexpected return code {}".format(result.returncode)
E           AssertionError: Got unexpected return code -4

tests/helpers.py:42: AssertionError
-------------------------------------------- Captured stdout call --------------------------------------------
. > /Users/ur20980/src/liboqs/build/tests/speed_sig SPHINCS+-SHAKE256-256f-simple -f

__________________________________ test_sig[SPHINCS+-SHAKE256-192f-robust] ___________________________________
[gw4] darwin -- Python 3.11.3 /opt/local/bin/python3

sig_name = 'SPHINCS+-SHAKE256-192f-robust'

    @helpers.filtered_test
    @pytest.mark.parametrize('sig_name', helpers.available_sigs_by_name())
    def test_sig(sig_name):
        kats = helpers.get_kats("sig")
        if not(helpers.is_sig_enabled_by_name(sig_name)): pytest.skip('Not enabled')
>       helpers.run_subprocess( [helpers.path_to_executable('speed_sig'), sig_name, "-f"])

tests/test_speed.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = ['/Users/ur20980/src/liboqs/build/tests/speed_sig', 'SPHINCS+-SHAKE256-192f-robust', '-f']
working_dir = '.'
env = {'ANT_OPTS': '-Dhttp.proxyHost=llproxy.llan.ll.mit.edu -Dhttp.proxyPort=8080', 'AS': 'yasm -m amd64 -f macho64 -p gas -X gnu ', 'AS_INTEGRATED_ASSEMBLER': '1', 'BASH_SILENCE_DEPRECATION_WARNING': '1', ...}
expected_returncode = 0, input = None, ignore_returncode = False

    def run_subprocess(command, working_dir='.', env=None, expected_returncode=0, input=None, ignore_returncode=False):
        """
        Helper function to run a shell command and report success/failure
        depending on the exit status of the shell command.
        """
        env_ = os.environ.copy()
        if env is not None:
            env_.update(env)
        env = env_

        # Note we need to capture stdout/stderr from the subprocess,
        # then print it, which pytest will then capture and
        # buffer appropriately
        print(working_dir + " > " + " ".join(command))

        result = subprocess.run(
                command,
                input=input,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                cwd=working_dir,
                env=env,
            )

        if not(ignore_returncode) and (result.returncode != expected_returncode):
            print(result.stdout.decode('utf-8'))
>           assert False, "Got unexpected return code {}".format(result.returncode)
E           AssertionError: Got unexpected return code -4

tests/helpers.py:42: AssertionError
-------------------------------------------- Captured stdout call --------------------------------------------
. > /Users/ur20980/src/liboqs/build/tests/speed_sig SPHINCS+-SHAKE256-192f-robust -f

__________________________________ test_sig[SPHINCS+-SHAKE256-256s-simple] ___________________________________
[gw2] darwin -- Python 3.11.3 /opt/local/bin/python3

sig_name = 'SPHINCS+-SHAKE256-256s-simple'

    @helpers.filtered_test
    @pytest.mark.parametrize('sig_name', helpers.available_sigs_by_name())
    def test_sig(sig_name):
        kats = helpers.get_kats("sig")
        if not(helpers.is_sig_enabled_by_name(sig_name)): pytest.skip('Not enabled')
>       helpers.run_subprocess( [helpers.path_to_executable('speed_sig'), sig_name, "-f"])

tests/test_speed.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = ['/Users/ur20980/src/liboqs/build/tests/speed_sig', 'SPHINCS+-SHAKE256-256s-simple', '-f']
working_dir = '.'
env = {'ANT_OPTS': '-Dhttp.proxyHost=llproxy.llan.ll.mit.edu -Dhttp.proxyPort=8080', 'AS': 'yasm -m amd64 -f macho64 -p gas -X gnu ', 'AS_INTEGRATED_ASSEMBLER': '1', 'BASH_SILENCE_DEPRECATION_WARNING': '1', ...}
expected_returncode = 0, input = None, ignore_returncode = False

    def run_subprocess(command, working_dir='.', env=None, expected_returncode=0, input=None, ignore_returncode=False):
        """
        Helper function to run a shell command and report success/failure
        depending on the exit status of the shell command.
        """
        env_ = os.environ.copy()
        if env is not None:
            env_.update(env)
        env = env_

        # Note we need to capture stdout/stderr from the subprocess,
        # then print it, which pytest will then capture and
        # buffer appropriately
        print(working_dir + " > " + " ".join(command))

        result = subprocess.run(
                command,
                input=input,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                cwd=working_dir,
                env=env,
            )

        if not(ignore_returncode) and (result.returncode != expected_returncode):
            print(result.stdout.decode('utf-8'))
>           assert False, "Got unexpected return code {}".format(result.returncode)
E           AssertionError: Got unexpected return code -4

tests/helpers.py:42: AssertionError
-------------------------------------------- Captured stdout call --------------------------------------------
. > /Users/ur20980/src/liboqs/build/tests/speed_sig SPHINCS+-SHAKE256-256s-simple -f

___________________________________ test_sig[SPHINCS+-SHA256-128f-robust] ____________________________________
[gw9] darwin -- Python 3.11.3 /opt/local/bin/python3

sig_name = 'SPHINCS+-SHA256-128f-robust'

    @helpers.filtered_test
    @pytest.mark.parametrize('sig_name', helpers.available_sigs_by_name())
    def test_sig(sig_name):
        kats = helpers.get_kats("sig")
        if not(helpers.is_sig_enabled_by_name(sig_name)): pytest.skip('Not enabled')
>       output = helpers.run_subprocess(
            [helpers.path_to_executable('kat_sig'), sig_name],
        )

tests/test_kat.py:29: 
.  .  .

To Reproduce Steps to reproduce the behavior:

  1. Clone, build optimized version, as per README.
  2. Go to build subdirectory.
  3. ninja run_tests
  4. See error

Expected behavior Tests passing or being skipped, test-scripts not crashing.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context I hope you can point me at where/what in Python environment seems missing - maybe an older version of some module, or such...?

baentsch commented 1 year ago

I hope you can point me at where/what in Python environment seems missing - maybe an older version of some module, or such...?

I don't see any reference to python module issues in the above...Why do you think this is the issue? If it's "only" exactly the 4 errors above with algorithms of the SPHINCS+ family (is it?) this reminds me a bit of https://github.com/open-quantum-safe/liboqs/pull/1420... If it's indeed only these Sphincs+ tests failing, would you be willing to re-build&test with "-DOQS_USE_SHA2_OPENSSL=OFF" to see whether this improves things (so it may be the same issue as in open-quantum-safe/liboqs#1420)?

baentsch commented 1 year ago

Just found a definite bug; until the fix for that lands, DOQS_USE_SHA2_OPENSSL=OFF would be a workaround.

mouse07410 commented 1 year ago

Just found a definite bug; until the fix for that lands, DOQS_USE_SHA2_OPENSSL=OFF would be a workaround.

Sorry, proposed workaround is no-joy:

liboqs-tests-ossl-3.2.0.txt

One excerpt:

[2/14] Linking C executable tests/test_sha3
FAILED: tests/test_sha3 
: && /opt/local/bin/clang -g -O0 -std=gnu18 -march=native -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -g -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names  src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600_avx2.dir/KeccakP-1600/avx2/KeccakP-1600-AVX2.S.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600times4_avx2.dir/KeccakP-1600times4/avx2/KeccakP-1600-times4-SIMD256.c.o src/common/CMakeFiles/common.dir/aes/aes.c.o src/common/CMakeFiles/common.dir/aes/aes_c.c.o src/common/CMakeFiles/common.dir/aes/aes128_ni.c.o src/common/CMakeFiles/common.dir/aes/aes256_ni.c.o src/common/CMakeFiles/common.dir/sha2/sha2.c.o src/common/CMakeFiles/common.dir/sha2/sha2_c.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3x4.c.o src/common/CMakeFiles/common.dir/common.c.o src/common/CMakeFiles/common.dir/pqclean_shims/nistseedexpander.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202x4.c.o src/common/CMakeFiles/common.dir/rand/rand.c.o src/common/CMakeFiles/common.dir/rand/rand_nist.c.o tests/CMakeFiles/test_sha3.dir/test_sha3.c.o -o tests/test_sha3  /Users/ur20980/openssl-3/lib/libcrypto.dylib  -lm && :
Undefined symbols for architecture x86_64:
  "_oqs_aes_256_ecb", referenced from:
      _AES256_ECB in rand_nist.c.o
  "_oqs_fetch_ossl_objects", referenced from:
      _OQS_init in common.c.o
  "_oqs_free_ossl_objects", referenced from:
      _OQS_destroy in common.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
mouse07410 commented 1 year ago

Without the workaround, at least the test-run completes, though the majority of the tests fail:

=========== 381 failed, 77 passed, 220 skipped in 101.68s (0:01:41) ============

liboqs-tests-out.zip

With that presumed workaround - tests don't even run, as shown above, and here:

.  .  .
FAILED: tests/test_aes 
: && /opt/local/bin/clang -g -O3 -std=gnu18 -march=native -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -O3 -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names  src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600_plain64.dir/KeccakP-1600/plain-64bits/KeccakP-1600-opt64.c.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600times4_serial.dir/KeccakP-1600times4/serial/KeccakP-1600-times4-on1.c.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600_avx2.dir/KeccakP-1600/avx2/KeccakP-1600-AVX2.S.o src/common/sha3/xkcp_low/CMakeFiles/xkcp_low_keccakp_1600times4_avx2.dir/KeccakP-1600times4/avx2/KeccakP-1600-times4-SIMD256.c.o src/common/CMakeFiles/common.dir/aes/aes.c.o src/common/CMakeFiles/common.dir/aes/aes_c.c.o src/common/CMakeFiles/common.dir/aes/aes128_ni.c.o src/common/CMakeFiles/common.dir/aes/aes256_ni.c.o src/common/CMakeFiles/common.dir/sha2/sha2.c.o src/common/CMakeFiles/common.dir/sha2/sha2_c.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3.c.o src/common/CMakeFiles/common.dir/sha3/xkcp_sha3x4.c.o src/common/CMakeFiles/common.dir/common.c.o src/common/CMakeFiles/common.dir/pqclean_shims/nistseedexpander.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202.c.o src/common/CMakeFiles/common.dir/pqclean_shims/fips202x4.c.o src/common/CMakeFiles/common.dir/rand/rand.c.o src/common/CMakeFiles/common.dir/rand/rand_nist.c.o tests/CMakeFiles/test_aes.dir/test_aes.c.o -o tests/test_aes  /Users/ur20980/openssl-3/lib/libcrypto.dylib  -lm && :
Undefined symbols for architecture x86_64:
  "_oqs_aes_256_ecb", referenced from:
      _AES256_ECB in rand_nist.c.o
  "_oqs_fetch_ossl_objects", referenced from:
      _OQS_init in common.c.o
  "_oqs_free_ossl_objects", referenced from:
      _OQS_destroy in common.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[1701/1716] Building C object src/sig/falcon/CMakeFil...con_1024_avx2.dir/pqclean_falcon-1024_avx2/keygen.c.o
ninja: build stopped: subcommand failed.
mouse07410 commented 1 year ago

Correct workaround is -DOQS_USE_OPENSSL=OFF:

== 458 passed, 220 skipped in 68.31s (0:01:08) ==

Looking forward to seeing the bug fix merged. TNX

baentsch commented 1 year ago

Correct workaround is -DOQS_USE_OPENSSL=OFF

Thanks for the test/reconfirmation. We'll look into this.

mouse07410 commented 1 year ago

It works on Apple Silicon. I'll need to check on x86_64.

But on Apple Silicon, OpenSSL-3.2.0-dev crashes several times, with this:

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [64110]

VM Region Info: 0 is not in any region.  Bytes before following region: 4341940224
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      102ccc000-102cd0000    [   16K] r-x/r-x SM=COW  .../MacOS/Python

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_pthread.dylib                0x1a1c948c4 pthread_rwlock_wrlock + 0
1   libcrypto.3.dylib                      0x104692dd4 CRYPTO_THREAD_write_lock + 12 (threads_pthread.c:110)
2   libcrypto.3.dylib                      0x10463ac14 ERR_unload_strings + 92 (err.c:314)
3   libcrypto.3.dylib                      0x10468f720 ossl_provider_free + 92 (provider_core.c:688)
4   libcrypto.3.dylib                      0x1046f5138 OPENSSL_sk_pop_free + 76 (stack.c:439)
5   libcrypto.3.dylib                      0x10468f104 sk_OSSL_PROVIDER_pop_free + 12 (provider_core.c:199) [inlined]
6   libcrypto.3.dylib                      0x10468f104 ossl_provider_store_free + 76 (provider_core.c:295)
7   libcrypto.3.dylib                      0x10467fd6c context_deinit_objs + 124 (context.c:250)
8   libcrypto.3.dylib                      0x10467f564 context_deinit + 16 (context.c:334) [inlined]
9   libcrypto.3.dylib                      0x10467f564 OSSL_LIB_CTX_free + 132 (context.c:465)
10  oqsprovider.0.5.0-dev.dylib            0x10404b560 oqsx_freeprovctx + 24 (oqsprov_keys.c:178)
11  oqsprovider.0.5.0-dev.dylib            0x10404a9a0 oqsprovider_teardown + 12 (oqsprov.c:553)
12  libcrypto.3.dylib                      0x103e17400 ossl_provider_free + 76
13  libcrypto.3.dylib                      0x103e6b5a0 OPENSSL_sk_pop_free + 60
14  libcrypto.3.dylib                      0x103e16ec0 ossl_provider_store_free + 72
15  libcrypto.3.dylib                      0x103e0b5c4 context_deinit_objs + 124
16  libcrypto.3.dylib                      0x103e0ae5c context_deinit + 32
17  libcrypto.3.dylib                      0x103e0ae2c ossl_lib_ctx_default_deinit + 20
18  libcrypto.3.dylib                      0x103e0dd80 OPENSSL_cleanup + 204
19  libsystem_c.dylib                      0x1a1b55ed4 __cxa_finalize_ranges + 492
20  libsystem_c.dylib                      0x1a1b55c4c exit + 44
21  libdyld.dylib                          0x1a1cb0554 dyld4::LibSystemHelpers::exit(int) const + 20
22  dyld                                   0x1a193ff7c start + 2320
baentsch commented 1 year ago

But on Apple Silicon, OpenSSL-3.2.0-dev crashes several times, with this:

So we re-open. Is this dependent on other providers being active? Any way you see how we can reproduce (also for CI)?

Edit/Add: Is this issue really correct here? Should it rather be an oqsprovider issue? I propose we make this (issue location) dependent on how the behaviour can get reproduced: OK for you, @mouse07410 ? FWIW, I have a similar issue tracked in https://github.com/open-quantum-safe/oqs-provider/issues/159...

mouse07410 commented 1 year ago

On Apple Silicon.

With minimal set of providers enabled:

=== 464 passed, 220 skipped in 40.24s ===

and no crash.

Update

With pkcs11-provider or oqs-provider enabled - same thing as above.

With both pkcs11-provider and oqs-provider enabled - tests still pass == 464 passed, 220 skipped in 40.80s ==, but crashes in the end:

Segmentation fault
FAILED: tests/CMakeFiles/run_tests /Users/ur20980/src/liboqs/build/tests/CMakeFiles/run_tests 
cd /Users/ur20980/src/liboqs && /opt/local/bin/cmake -E env OQS_BUILD_DIR=/Users/ur20980/src/liboqs/build python3 -m pytest --verbose --numprocesses=auto --ignore=scripts/copy_from_upstream/repos
ninja: build stopped: subcommand failed.

and crash report:

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [18302]

VM Region Info: 0 is not in any region.  Bytes before following region: 4310319104
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      100ea4000-100ea8000    [   16K] r-x/r-x SM=COW  .../MacOS/Python

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_pthread.dylib                0x1a1c948c4 pthread_rwlock_wrlock + 0
1   libcrypto.3.dylib                      0x10564add4 CRYPTO_THREAD_write_lock + 12 (threads_pthread.c:110)
2   libcrypto.3.dylib                      0x1055f2c14 ERR_unload_strings + 92 (err.c:314)
3   libcrypto.3.dylib                      0x105647720 ossl_provider_free + 92 (provider_core.c:688)
4   libcrypto.3.dylib                      0x1056ad138 OPENSSL_sk_pop_free + 76 (stack.c:439)
5   libcrypto.3.dylib                      0x105647104 sk_OSSL_PROVIDER_pop_free + 12 (provider_core.c:199) [inlined]
6   libcrypto.3.dylib                      0x105647104 ossl_provider_store_free + 76 (provider_core.c:295)
7   libcrypto.3.dylib                      0x105637d6c context_deinit_objs + 124 (context.c:250)
8   libcrypto.3.dylib                      0x105637564 context_deinit + 16 (context.c:334) [inlined]
9   libcrypto.3.dylib                      0x105637564 OSSL_LIB_CTX_free + 132 (context.c:465)
10  oqsprovider.0.5.0-dev.dylib            0x10436f560 oqsx_freeprovctx + 24 (oqsprov_keys.c:178)
11  oqsprovider.0.5.0-dev.dylib            0x10436e9a0 oqsprovider_teardown + 12 (oqsprov.c:553)
12  libcrypto.3.dylib                      0x104547400 ossl_provider_free + 76
13  libcrypto.3.dylib                      0x10459b5a0 OPENSSL_sk_pop_free + 60
14  libcrypto.3.dylib                      0x104546ec0 ossl_provider_store_free + 72
15  libcrypto.3.dylib                      0x10453b5c4 context_deinit_objs + 124
16  libcrypto.3.dylib                      0x10453ae5c context_deinit + 32
17  libcrypto.3.dylib                      0x10453ae2c ossl_lib_ctx_default_deinit + 20
18  libcrypto.3.dylib                      0x10453dd80 OPENSSL_cleanup + 204
19  libsystem_c.dylib                      0x1a1b55ed4 __cxa_finalize_ranges + 492
20  libsystem_c.dylib                      0x1a1b55c4c exit + 44
21  libdyld.dylib                          0x1a1cb0554 dyld4::LibSystemHelpers::exit(int) const + 20
22  dyld                                   0x1a193ff7c start + 2320
baentsch commented 1 year ago

@mouse07410 Can you still reproduce this issue after the latest oqsprovider & libops updates?

mouse07410 commented 1 year ago

@mouse07410 Can you still reproduce this issue after the latest oqsprovider & libops updates?

Unfortunately, yes I can. Tests proceed fine, but when they finish I see about 5 or so OS crash reports telling me that it crashed.

baentsch commented 1 year ago

Unfortunately, yes I can. Tests proceed fine, but when they finish I see about 5 or so OS crash reports telling me that it crashed.

Too bad. On Ubuntu, everything works like a charm (see below for OpenSSL3.0.2; same results for 3.2.0-dev): Tests pass and no crash at the end, so I cannot debug into this... Tagging @mattcaswell for thoughts as to whether this may be another openssl (config) issue (?)

Test setup:
LD_LIBRARY_PATH=/home/mib/git/oqs/oqs-provider/.local/lib64
OPENSSL_APP=openssl
OPENSSL_CONF=/home/mib/git/oqs/oqs-provider/scripts/openssl-ca.cnf
OPENSSL_MODULES=/home/mib/git/oqs/oqs-provider/_build/lib
Version information:
OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)
Providers:
  default
    name: OpenSSL Default Provider
    version: 3.0.2
    status: active
    build info: 3.0.2
    gettable provider parameters:
      name: pointer to a UTF8 encoded string (arbitrary size)
      version: pointer to a UTF8 encoded string (arbitrary size)
      buildinfo: pointer to a UTF8 encoded string (arbitrary size)
      status: integer (arbitrary size)
  oqsprovider
    name: OpenSSL OQS Provider
    version: 0.5.0-dev
    status: active
    build info: OQS Provider v.0.5.0-dev (8481fcf) based on liboqs v.0.8.0-dev
    gettable provider parameters:
      name: pointer to a UTF8 encoded string (arbitrary size)
      version: pointer to a UTF8 encoded string (arbitrary size)
      buildinfo: pointer to a UTF8 encoded string (arbitrary size)
      status: integer (arbitrary size)
  oqsprovider2
    name: OpenSSL OQS Provider
    version: 0.5.0-dev
    status: active
    build info: OQS Provider v.0.5.0-dev (8481fcf) based on liboqs v.0.8.0-dev
    gettable provider parameters:
      name: pointer to a UTF8 encoded string (arbitrary size)
      version: pointer to a UTF8 encoded string (arbitrary size)
      buildinfo: pointer to a UTF8 encoded string (arbitrary size)
      status: integer (arbitrary size)
  pkcs11
    name: PKCS#11 Provider
    version: 3.2.0
    status: active
    build info: 3.2.0-dev
    gettable provider parameters:
      name: pointer to a UTF8 encoded string (arbitrary size)
      version: pointer to a UTF8 encoded string (arbitrary size)
      buildinfo: pointer to a UTF8 encoded string (arbitrary size)
      status: integer (arbitrary size)
Cert gen/verify, CMS sign/verify, CA tests for all enabled algorithms commencing...
.......................
Test project /home/mib/git/oqs/oqs-provider/_build
    Start 1: oqs_signatures
1/5 Test #1: oqs_signatures ...................   Passed    5.51 sec
    Start 2: oqs_kems
2/5 Test #2: oqs_kems .........................   Passed    0.15 sec
    Start 3: oqs_groups
3/5 Test #3: oqs_groups .......................   Passed    0.32 sec
    Start 4: oqs_tlssig
4/5 Test #4: oqs_tlssig .......................   Passed    0.00 sec
    Start 5: oqs_endecode
5/5 Test #5: oqs_endecode .....................   Passed   13.91 sec

100% tests passed, 0 tests failed out of 5

Total Test time (real) =  19.89 sec

All oqsprovider tests passed.
mattcaswell commented 1 year ago

This is crashing during a call to OPENSSL_cleanup which occurs automatically on process exit. This is why the tests appear to pass and then still crash, i.e. all the work has been done successfully and it crashes while the process is exiting.

The stack trace shows us this:

0   libsystem_pthread.dylib                0x1a1c948c4 pthread_rwlock_wrlock + 0
1   libcrypto.3.dylib                      0x10564add4 CRYPTO_THREAD_write_lock + 12 (threads_pthread.c:110)
2   libcrypto.3.dylib                      0x1055f2c14 ERR_unload_strings + 92 (err.c:314)

I assume the "0 is not in any region" message in the stack trace means that the write lock that ERR_unload_strings is trying to obtain is NULL. ERR_unload_strings looks like this:

int ERR_unload_strings(int lib, ERR_STRING_DATA *str)
{
#ifndef OPENSSL_NO_ERR
    if (!RUN_ONCE(&err_string_init, do_err_strings_init))
        return 0;

    if (!CRYPTO_THREAD_write_lock(err_string_lock))
        return 0;

With the err_string_lock being initialised via a RUN_ONCE in do_err_strings_init:

DEFINE_RUN_ONCE_STATIC(do_err_strings_init)
{
    if (!OPENSSL_init_crypto(OPENSSL_INIT_BASE_ONLY, NULL))
        return 0;
    err_string_lock = CRYPTO_THREAD_lock_new();
    if (err_string_lock == NULL)
        return 0;

Since ERR_unload_strings calls the do_err_strings_init function via RUN_ONCE we can assume that the err_string_lock must have once been initialised - but for some reason no longer is. The only way I can see that happening is if err_cleanup has already been called:

void err_cleanup(void)
{
    if (set_err_thread_local != 0)
        CRYPTO_THREAD_cleanup_local(&err_thread_local);
    CRYPTO_THREAD_lock_free(err_string_lock);
    err_string_lock = NULL;
#ifndef OPENSSL_NO_ERR
    lh_ERR_STRING_DATA_free(int_error_hash);
    int_error_hash = NULL;
#endif
}

But that is only ever called from OPENSSL_cleanup (which according to the stack trace is what we are currently doing):

17  libcrypto.3.dylib                      0x10453ae2c ossl_lib_ctx_default_deinit + 20
18  libcrypto.3.dylib                      0x10453dd80 OPENSSL_cleanup + 204

So, we are currently in the ossl_lib_ctx_default_deinit step off OPENSSL_cleanup....but that occurs before any call to err_cleanup:

    OSSL_TRACE(INIT, "OPENSSL_cleanup: ossl_lib_ctx_default_deinit()\n");
    ossl_lib_ctx_default_deinit();

    ossl_cleanup_thread();

    OSSL_TRACE(INIT, "OPENSSL_cleanup: bio_cleanup()\n");
    bio_cleanup();

    OSSL_TRACE(INIT, "OPENSSL_cleanup: evp_cleanup_int()\n");
    evp_cleanup_int();

    OSSL_TRACE(INIT, "OPENSSL_cleanup: ossl_obj_cleanup_int()\n");
    ossl_obj_cleanup_int();

    OSSL_TRACE(INIT, "OPENSSL_cleanup: err_int()\n");
    err_cleanup();

So, right now I don't see a situation how this could occur!?

In any case this doesn't currently look like an oqs-provider issue. More likely it is an OpenSSL problem. Perhaps open an issue there?

mouse07410 commented 1 year ago

In any case this doesn't currently look like an oqs-provider issue. More likely it is an OpenSSL problem. Perhaps open an issue there?

Opened an OpenSSL issue for this. Also, it affects both Apple Silicon and x86_64 platforms.

baentsch commented 1 year ago

As I just chased another OpenSSL (only in 3.0.1 -- triggered by old CI packaging) bug for a few hours now, I tend to also consider this issue moot for oqsprovider. We may leave it open for the fun of it, but it won't stop me from finally doing a release for oqsprovider. Objections, anyone?

mouse07410 commented 1 year ago

We may leave it open for the fun of it, but it won't stop me from finally doing a release for oqsprovider. Objections, anyone?

No objections - let's release.

mouse07410 commented 1 year ago

As far as I can see, current master runs and passes tests on my MacOS Ventura, both Intel and M2 platforms. Using OpenSSL-3.1.1 and 3.2.0-dev.

Thus, let me close this for now - might re-open if hit with the same problem again. ;-) Thanks!

baentsch commented 1 year ago

As far as I can see, current master runs and passes tests on my MacOS Ventura, both Intel and M2 platforms. Using OpenSSL-3.1.1 and 3.2.0-dev.

Thanks for these tests on what is now a release: Glad to hear that!