Closed fabled closed 1 month ago
@fabled can you test if it works if you set this config option in openssl.cnf:
pkcs11-module-load-behavior = early
If it does not work with that option can you provide a debug log from pkcs11-provider?
Yes, things work with pkcs11-module-load-behavior = early
. The issue is happening against SoftHSMv2 using OpenSSL backend.
Find attached debug.log with default load behavior and failed operation. Though I have locally modified check out to fix #265.
Hum, seems this is somewhat known issue as the digest tests does exactly this to not fail. See https://github.com/latchset/pkcs11-provider/blob/main/tests/tdigest#L7-L11
So now that I am trying to get the test suite working with softhsm built to use openssl... I'm seeing random failures with pkcs11-module-load-behavior = early
. I suppose this scenario is more or less the reason why the delayed initialization was added?
NOt really, delayed initialization was added to prevent loading modules just because openssl asked for names of digests but never actually uses the module. What kind of issues do you see with softhsm and what version do you have?
Softhsm has the very bad issue that it does not use a libctx, instead uses the same default openssl context as the calling application, which can cause issues, especially at process shutdown.
To alleviate some of the issues at shutdown we have: pkcs11-module-quirks = no-deinit
Yes, I have that quirk enabled, and it fixed half the issues. But I'm also seeing various different random crashes on multi threaded programs, such as:
D:keyfromlabel:--------------------------------------------------------------------------------
D:/oss/bind9/bin/tests/system/keyfromlabel:Core was generated by `/oss/bind9/bin/dnssec/.libs/dnssec-signzone -v 1 -E provider -S -a -g -o rsasha'.
D:/oss/bind9/bin/tests/system/keyfromlabel:Program terminated with signal SIGSEGV, Segmentation fault.
D:/oss/bind9/bin/tests/system/keyfromlabel:#0 __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x0) at ./nptl/pthread_rwlock_common.c:603
D:/oss/bind9/bin/tests/system/keyfromlabel:[Current thread is 1 (Thread 0x7fe92d33d680 (LWP 17157))]
D:/oss/bind9/bin/tests/system/keyfromlabel:#0 __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x0) at ./nptl/pthread_rwlock_common.c:603
D:/oss/bind9/bin/tests/system/keyfromlabel:#1 ___pthread_rwlock_wrlock (rwlock=0x0) at ./nptl/pthread_rwlock_wrlock.c:26
D:/oss/bind9/bin/tests/system/keyfromlabel:#2 0x00007fe9314a6889 in CRYPTO_THREAD_write_lock () from /lib/x86_64-linux-gnu/libcrypto.so.3
D:/oss/bind9/bin/tests/system/keyfromlabel:#3 0x00007fe9314d41e3 in RAND_get_rand_method () from /lib/x86_64-linux-gnu/libcrypto.so.3
D:/oss/bind9/bin/tests/system/keyfromlabel:#4 0x00007fe9314d4a19 in RAND_bytes_ex () from /lib/x86_64-linux-gnu/libcrypto.so.3
D:/oss/bind9/bin/tests/system/keyfromlabel:#5 0x00007fe92eacc735 in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#6 0x00007fe92ead5a63 in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#7 0x00007fe92eaf1046 in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#8 0x00007fe92ea91321 in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#9 0x00007fe92ea9abfa in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#10 0x00007fe92ea6cb44 in C_SignInit () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#11 0x00007fe92eb3e8f2 in p11prov_SignInit (ctx=0x562cac127610, hSession=3, pMechanism=0x7fe92d3364c0, hKey=8) at /usr/src/pkcs11-provider-e763a88ecdb9fcdbae2d52ffe981272e6ca15294/src/interface.gen.c:654
D:/oss/bind9/bin/tests/system/keyfromlabel:#12 0x00007fe92eb51755 in p11prov_sig_operate_init (sigctx=sigctx@entry=0x7fe9280010c0, digest_op=digest_op@entry=false, _session=0x7fe92d336598) at signature.c:818
D:/oss/bind9/bin/tests/system/keyfromlabel:#13 0x00007fe92eb51d77 in p11prov_sig_operate (sigctx=0x7fe9280010c0, sig=0x7fe928000bb0 "", siglen=0x7fe92d3366c8, sigsize=<optimized out>, tbs=0x7fe92d3365a0 "010\r\006\t`\206H\001e\003\004\002\001\005", tbslen=51) at signature.c:904
D:/oss/bind9/bin/tests/system/keyfromlabel:#14 0x00007fe92eb5206c in p11prov_rsasig_sign (ctx=0x7fe9280010c0, sig=0x7fe928000bb0 "", siglen=0x7fe92d3366c8, sigsize=256, tbs=0x7fe92d3366d0 "\314\035 D\377{\237N\234kz:2\n\304\004\376\262\200\206a\370y0\a\302\325v\027\001z\357Ph3-\351\177", tbslen=32) at signature.c:1131
D:/oss/bind9/bin/tests/system/keyfromlabel:#15 0x00007fe9314863d1 in EVP_SignFinal_ex () from /lib/x86_64-linux-gnu/libcrypto.so.3
D:/oss/bind9/bin/tests/system/keyfromlabel:#16 0x00007fe931a51264 in opensslrsa_sign (dctx=0x7fe928000f20, sig=0x7fe92d336850) at opensslrsa_link.c:268
D:/oss/bind9/bin/tests/system/keyfromlabel:#17 0x00007fe931a040cc in dns_dnssec_sign (name=name@entry=0x7fe92d339270, set=set@entry=0x7fe92d337d70, key=key@entry=0x562cac180960, inception=inception@entry=0x562cab058cb4 <starttime>, expire=expire@entry=0x7fe92d336f0c, mctx=0x562cac167990, buffer=0x7fe92d336f40, sigrdata=0x7fe92d336f10) at dnssec.c:348
D:/oss/bind9/bin/tests/system/keyfromlabel:#18 0x0000562cab04b6d0 in signwithkey (name=name@entry=0x7fe92d339270, rdataset=rdataset@entry=0x7fe92d337d70, key=0x562cac180960, ttl=ttl@entry=300, add=add@entry=0x7fe92d337d20, logmsg=logmsg@entry=0x562cab0535c8 "signing with dnskey") at dnssec-signzone.c:301
D:/oss/bind9/bin/tests/system/keyfromlabel:#19 0x0000562cab04df03 in signset (set=0x7fe92d337d70, name=0x7fe92d339270, node=0x562cac17adc0, add=0x7fe92d337d20, del=0x7fe92d337d00) at dnssec-signzone.c:778
D:/oss/bind9/bin/tests/system/keyfromlabel:#20 signname (node=0x562cac17adc0, name=0x7fe92d339270) at dnssec-signzone.c:1222
D:/oss/bind9/bin/tests/system/keyfromlabel:#21 0x0000562cab0500c8 in assignwork (arg=<optimized out>) at dnssec-signzone.c:1668
D:/oss/bind9/bin/tests/system/keyfromlabel:#22 0x00007fe931bce727 in isc__async_cb (handle=<optimized out>) at async.c:111
D:/oss/bind9/bin/tests/system/keyfromlabel:#23 0x00007fe9317a409d in ?? () from /lib/x86_64-linux-gnu/libuv.so.1
D:/oss/bind9/bin/tests/system/keyfromlabel:#24 0x00007fe9317b7e3c in ?? () from /lib/x86_64-linux-gnu/libuv.so.1
D:/oss/bind9/bin/tests/system/keyfromlabel:#25 0x00007fe9317a49e4 in uv_run () from /lib/x86_64-linux-gnu/libuv.so.1
D:/oss/bind9/bin/tests/system/keyfromlabel:#26 0x00007fe931bdfbec in loop_thread (arg=arg@entry=0x562cac168fc0) at loop.c:282
D:/oss/bind9/bin/tests/system/keyfromlabel:#27 0x00007fe931bedf48 in thread_body (wrap=0x562cac193990) at thread.c:85
D:/oss/bind9/bin/tests/system/keyfromlabel:#28 thread_run (wrap=0x562cac193990) at thread.c:100
D:/oss/bind9/bin/tests/system/keyfromlabel:#29 0x00007fe93184e044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
D:/oss/bind9/bin/tests/system/keyfromlabel:#30 0x00007fe9318cd860 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
D:keyfromlabel:--------------------------------------------------------------------------------
Running in valgrind does not help as the problem disappears as a side effect.
I have not seen this before, but I wonder if it could be a nested lock or something causing openssl called from softhsm to mess with structures that are not fully set up yet?
Yes something funny happens in openssl. I suspect the app -> openssl init -> provider -> softhsm -> openssl init
causes openssl init
to not setup everything correctly.
So on softhsm, for me: with early
init, I get this and with default
init the digest stuff is not working.
I need to figure out if I can fix openssl / determine root cause of this. Or if should just try to do softhsm built with botan, or move to nss soft token.
What is the first operation you attempt in Bind?
Actually I need to investigate more. Seems the crash is sort of valid, as other thread had failed operation and was doing openssl deinit which causes the crash :-o
The first failure is:
dnssec-signzone: warning: EVP_SignFinal (opensslrsa_link.c:269) failed (failure)
dnssec-signzone: info: error:4080000A:pkcs11::Internal locking failure:session.c:621:Failed to fetch a login_session
dnssec-signzone: info: error:4080000A:pkcs11::Internal locking failure:signature.c:837:Failed to open session on slot 1007647743
dnssec-signzone: fatal: dnskey 'rsasha512.example/RSASHA512/53583' failed to sign data: failure
I can reproduce this only with the early
init mode, because the code does not get to doing the signing in default
load mode.
Making pkcs11-provider do debugging makes seems to mostly make this issue disappear. However, find attached debug.log of this happening (though it seems garbled due to multiple thread usage).
It seems that since commit 6403f223 fetch_session
will immediately fail if two threads are concurrently doing the first operation on a slot that would need login. Probably triggers in my case since it's a simple command line executable that just loads one private key on token, and starts ncores
threads and does sign operations on each using the same key.
It is quite possible that this issue (at least the original dnssec-signzone: info: error:03000096:digital envelope routines::operation not supported for this keytype:crypto/evp/signature.c:579:
) is fixed with the fix from https://github.com/latchset/pkcs11-provider/pull/372.
The place where this error is issued is in evp_pkey_signature_init()
in the legacy code path. The legacy code path is reached if either the PKEY is a legacy PKEY (unlikely in this case), or if it tried to find SIGNATURE method from a provider, and it failed to find one. To find a SIGNATURE method it first fetches it based on the property query. If this is from a different provider than the one that was used to create the PKEY, then it is tries to export the key material from the original provider and import it into the one from which the fetched SIGNATURE method is from. In case the PKEY was built from a PKCS#11 URI with the pkcs11-provider, then the key is most likely not exportable, and the export fails. If this happens, it takes another round trying to fetch the SIGNATURE method from the provider that was used to create the PKEY, i.e. the pkcs11-provider in this case. This is then supposed to be able to work with that key.
If you run into the situation that the fix in https://github.com/latchset/pkcs11-provider/pull/372 is caring about, then that second try to fetch the SIGNATURE method from the pkcs11-provider that was used to create the PKEY will fail, because OpenSSl has queried the provider too early, when it has not been fully initialized. So it cached the information that the provider would not support signatures, and thus fetching the SIGNATURE method form the provider will not find any. This will result in dnssec-signzone: info: error:03000096:digital envelope routines::operation not supported for this keytype:crypto/evp/signature.c:579:
.
@fabled any chance you can re-test with latest code? There are at least 3 fixes that may improve the situation here, one on locking, one of session fetching, and one on operations caching (this may even eliminate the need for setting the "early" quirk for you and so it'd be nice if you try without it too.
Even though I am not the original reported, it seems that this is still not fixed even with current main branch (and openssl-3.2). However, the error is different now:
No self-signed KSK DNSKEY found
Zone verification failed (failure)
And it seems that provider has issues handling the key. When comparing the provider debug output with early
and wihout it, it is basically equal until the following point (output without early
):
[../src/keymgmt.c:676] p11prov_rsa_new(): rsa new
[../src/keymgmt.c:737] p11prov_rsa_import(): rsa import 0x7f470ee3f280
[../src/signature.c:1165] p11prov_rsasig_verify_init(): rsa verify init (ctx=0x7f470ee3f100, key=0x7f470ee3f280, params=(nil))
[../src/objects.c:402] p11prov_obj_ref_no_cache(): Ref Object: 0x7f470ee3f280 (handle:0)
[../src/provider.c:606] p11prov_ctx_cache_keys(): cache_keys = 1
[../src/signature.c:1460] p11prov_rsasig_set_ctx_params(): rsasig set ctx params (ctx=0x7f470ee3f100, params=(nil))
[../src/signature.c:1460] p11prov_rsasig_set_ctx_params(): rsasig set ctx params (ctx=0x7f470ee3f100, params=0x7fff34973230)
[../src/signature.c:1475] p11prov_rsasig_set_ctx_params(): Set OSSL_SIGNATURE_PARAM_DIGEST to SHA256
[../src/signature.c:1183] p11prov_rsasig_verify(): rsa verify (ctx=0x7f470ee3f100)
[../src/signature.c:784] p11prov_sig_operate_init(): called (sigctx=0x7f470ee3f100, digest_op=false)
[../src/signature.c:794] p11prov_sig_operate_init(): Error: 0x00000060; Provided key has invalid handle
I am not exactly sure why p11prov_rsa_new()
is happening without early
but this call never happens when the setting is enabled.
Actually there are more differences than that. With early
function p11prov_rsa_export()
is called multiple times during the dnssec-signzone while without early
it is called just once... attaching both provider debug outputs.
@The-Mule do you know if this bug goes away when using kryoptic?
The BIND dnssec-signzone no longer is able to sign using RSA keys. The RSA (also EC) Digest + Sign operation no longer works.
Bisected to commit a912c04fc36ee65f6a7d02b57cba9794bf66a982.
The main error that happens on every commit afterwards (other errors vary) is:
As if the EVP structure is not setup properly for signatures.