openssl / openssl

TLS/SSL and crypto library
https://www.openssl.org
Apache License 2.0
26.05k stars 10.17k forks source link

Test failure on aarch64 with 3.0.1 & 3.0.2 #17900

Open h-vetinari opened 2 years ago

h-vetinari commented 2 years ago

This was observed in https://github.com/conda-forge/openssl-feedstock/pull/79 & https://github.com/conda-forge/openssl-feedstock/pull/84:

There's a single test failure on aarch64 (all other platforms we're building for ran fine):

Test Summary Report
-------------------
30-test_afalg.t                  (Wstat: 256 Tests: 1 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1
Files=242, Tests=3422, 6525 wallclock secs (44.53 usr  1.00 sys + 6222.38 cusr 280.17 csys = 6548.08 CPU)
Result: FAIL

The error (details below) says undefined symbol: EVP_PKEY_base_id, which sounds like it's related to https://github.com/openssl/openssl/issues/17003, but that function does not appear at the callsite, which looks like:

# ERROR: (bool) 'EVP_CipherInit_ex(ctx, cipher, e, key, iv, 1) == true' failed @ test/afalgtest.c:85

Interestingly, the same test also fails for 1.1.1n, though less loudly (details also below the fold).

There's a small but non-zero possibility that this has something to do with running in emulation through QEMU, but I doubt it (since it affects only one test, and one specific symbol).

For 3.0.2: ``` [...] 30-test_acvp.t ..................... ok 30-test_aesgcm.t ................... ok ALG_PERR: engines/e_afalg.c(252): io_setup error : Function not implemented # ERROR: (bool) 'EVP_CipherInit_ex(ctx, cipher, e, key, iv, 1) == true' failed @ test/afalgtest.c:85 # false # D099780255000000:error:1280006A:DSO support routines:dlfcn_bind_func:could not bind to the requested symbol name:crypto/dso/dso_dlfcn.c:188:symname(EVP_PKEY_base_id): $SRC_DIR/engines/afalg.so: undefined symbol: EVP_PKEY_base_id # D099780255000000:error:1280006A:DSO support routines:DSO_bind_func:could not bind to the requested symbol name:crypto/dso/dso_lib.c:176: # D099780255000000:error:40000069:lib(128)::io setup failed:engines/e_afalg.c:253: # OPENSSL_TEST_RAND_ORDER=1647419023 not ok 1 - iteration 1 # ------------------------------------------------------------------------------ ALG_PERR: engines/e_afalg.c(252): io_setup error : Function not implemented # ERROR: (bool) 'EVP_CipherInit_ex(ctx, cipher, e, key, iv, 1) == true' failed @ test/afalgtest.c:85 # false # D099780255000000:error:40000069:lib(128)::io setup failed:engines/e_afalg.c:253: # OPENSSL_TEST_RAND_ORDER=1647419023 not ok 2 - iteration 2 # ------------------------------------------------------------------------------ ALG_PERR: engines/e_afalg.c(252): io_setup error : Function not implemented # ERROR: (bool) 'EVP_CipherInit_ex(ctx, cipher, e, key, iv, 1) == true' failed @ test/afalgtest.c:85 # false # D099780255000000:error:40000069:lib(128)::io setup failed:engines/e_afalg.c:253: # OPENSSL_TEST_RAND_ORDER=1647419023 not ok 3 - iteration 3 # ------------------------------------------------------------------------------ # OPENSSL_TEST_RAND_ORDER=1647419023 not ok 1 - test_afalg_aes_cbc # ------------------------------------------------------------------------------ ../../util/wrap.pl ../../test/afalgtest => 1 not ok 1 - running afalgtest # ------------------------------------------------------------------------------ # Failed test 'running afalgtest' # at test/recipes/30-test_afalg.t line 21. # Looks like you failed 1 test of 1.30-test_afalg.t .................... Dubious, test returned 1 (wstat 256, 0x100) Failed 1/1 subtests 30-test_defltfips.t ................ ok 30-test_engine.t ................... ok [...] ``` For 1.1.1n ``` [...] ../test/recipes/25-test_x509.t ..................... ok ../test/recipes/30-test_afalg.t .................... Dubious, test returned 1 (wstat 256, 0x100) Failed 1/1 subtests ../test/recipes/30-test_engine.t ................... ok ../test/recipes/30-test_evp.t ...................... ok [...] ```
t8m commented 2 years ago

The DSO bind errors are a red herring. Though we need to fix those.

The real error is the io setup failed. And IMO that is caused by the environment.

h-vetinari commented 2 years ago

Thanks a lot for the quick response! I'm just started a run that includes the patches from #17901 & #17902 - will report back later.

The real error is the io setup failed. And IMO that is caused by the environment.

Could you elaborate what you mean by "environment" here?

t8m commented 2 years ago

Could you elaborate what you mean by "environment" here?

The kernel, the fact that it is being run via QEMU, etc.

h-vetinari commented 2 years ago

OK, with the patch from #17901, the symbol error is gone, but still missing io_setup in engines/e_afalg.c (see below). Which library should be providing this function?

30-test_acvp.t ..................... ok
30-test_aesgcm.t ................... ok

ALG_PERR: engines/e_afalg.c(252): io_setup error : Function not implemented
        # ERROR: (bool) 'EVP_CipherInit_ex(ctx, cipher, e, key, iv, 1) == true' failed @ test/afalgtest.c:85
        # false
        # D099780255000000:error:40000069:lib(128)::io setup failed:engines/e_afalg.c:253:
        # OPENSSL_TEST_RAND_ORDER=1647438183
        not ok 1 - iteration 1
# ------------------------------------------------------------------------------
ALG_PERR: engines/e_afalg.c(252): io_setup error : Function not implemented
        # ERROR: (bool) 'EVP_CipherInit_ex(ctx, cipher, e, key, iv, 1) == true' failed @ test/afalgtest.c:85
        # false
        # D099780255000000:error:40000069:lib(128)::io setup failed:engines/e_afalg.c:253:
        # OPENSSL_TEST_RAND_ORDER=1647438183
        not ok 2 - iteration 2
# ------------------------------------------------------------------------------
ALG_PERR: engines/e_afalg.c(252): io_setup error : Function not implemented
        # ERROR: (bool) 'EVP_CipherInit_ex(ctx, cipher, e, key, iv, 1) == true' failed @ test/afalgtest.c:85
        # false
        # D099780255000000:error:40000069:lib(128)::io setup failed:engines/e_afalg.c:253:
        # OPENSSL_TEST_RAND_ORDER=1647438183
        not ok 3 - iteration 3
# ------------------------------------------------------------------------------
    # OPENSSL_TEST_RAND_ORDER=1647438183
    not ok 1 - test_afalg_aes_cbc
# ------------------------------------------------------------------------------
../../util/wrap.pl ../../test/afalgtest => 1
not ok 1 - running afalgtest
# ------------------------------------------------------------------------------
#   Failed test 'running afalgtest'
#   at test/recipes/30-test_afalg.t line 21.
# Looks like you failed 1 test of 1.30-test_afalg.t .................... 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests 
30-test_defltfips.t ................ ok
30-test_engine.t ................... ok
t8m commented 2 years ago

Which library should be providing this function?

The kernel. This is a syscall. It might very well be that the syscall number for the io_setup syscall is different on different architectures and when you're running through qemu it does not properly translate it.

Anyway I do not think this is an openssl issue unless this is reproducible when running on a real hardware or at least in a VM.

jakirkham commented 2 years ago

Am curious why a syscall is used here as opposed to calling io_setup as defined in linux/aio_abi.h?

johalun commented 2 years ago

I'm having the same issue when building in Gitlab CI. All tests pass when I build in our local Gitlab runner but on a shared Gitlab CI runner, the afalgtest fails. Host OS are both 5.4 kernel, we have host OS Ubuntu while Gitlab reports Alpine. Gitlab uses Google Compute for their shared runners. Could be they run in VMs and that's what's causing the issue... Edit: In our case it's all x86_64 arch.

tom-cosgrove-arm commented 2 years ago

Should this issue have been closed by #17945?

(For others coming here from a search engine etc, note there's a related issue #7687 (from back in 2018) which was fixed by #7688 (merged in Jan 2022))

t8m commented 2 years ago

Well #17945 fixes the issue in our CI. That does not mean other people won't encounter it when running the tests.

t8m commented 2 years ago

The question is - is this a problem that cannot be solved on qemu runs and if so, would it be somehow possible to detect it in the test and skip it automatically?

zorrorffm commented 2 years ago

May I ask is it possible that a linux/aarch64 machine is added to run project CI? If a linux/aarch64 machine is added, it can be used to run unit tests rather than only compilation test, and definitely it will avoid the issue above.

t8m commented 2 years ago

We do not have such machine.

zorrorffm commented 2 years ago

We do not have such machine.

Is it possible community accept other organization's fund on such platform? I can deliver this request to relevant people see if we or partner can fund community that machine. If there is a channel dedicated to this kind of thing, please let me know, we can follow it up in that channel when we make progress on it.