latchset / pkcs11-provider

A pkcs#11 provider for OpenSSL 3.0+
Other
65 stars 39 forks source link

Digest + Sign operation no longer working (in default config) #266

Closed fabled closed 1 month ago

fabled commented 1 year ago

The BIND dnssec-signzone no longer is able to sign using RSA keys. The RSA (also EC) Digest + Sign operation no longer works.

Bisected to commit a912c04fc36ee65f6a7d02b57cba9794bf66a982.

dnssec-signzone: info: DNSKEY rsasha256.example/RSASHA256/13390 (ZSK) is now active
dnssec-signzone: info: DNSKEY rsasha256.example/RSASHA256/27089 (KSK) is now active
dnssec-signzone: rsasha256.example/NSEC:
dnssec-signzone:    signing with dnskey rsasha256.example/RSASHA256/13390
dnssec-signzone: warning: EVP_SignFinal (opensslrsa_link.c:269) failed (failure)
dnssec-signzone: info: error:408000E0:pkcs11::The token was not present in its slot when the function was invoked:store.c:114:Failed to get session to load keys
dnssec-signzone: info: error:408000E0:pkcs11::The token was not present in its slot when the function was invoked:store.c:114:Failed to get session to load keys
dnssec-signzone: info: error:408000E0:pkcs11::The token was not present in its slot when the function was invoked:store.c:114:Failed to get session to load keys
dnssec-signzone: info: error:408000E0:pkcs11::The token was not present in its slot when the function was invoked:store.c:114:Failed to get session to load keys
dnssec-signzone: info: error:03000096:digital envelope routines::operation not supported for this keytype:crypto/evp/signature.c:579:
dnssec-signzone: fatal: dnskey 'rsasha256.example/RSASHA256/13390' failed to sign data: failure

The main error that happens on every commit afterwards (other errors vary) is:

dnssec-signzone: info: error:03000096:digital envelope routines::operation not supported for this keytype:crypto/evp/signature.c:579:

As if the EVP structure is not setup properly for signatures.

simo5 commented 1 year ago

@fabled can you test if it works if you set this config option in openssl.cnf: pkcs11-module-load-behavior = early

simo5 commented 1 year ago

If it does not work with that option can you provide a debug log from pkcs11-provider?

fabled commented 1 year ago

Yes, things work with pkcs11-module-load-behavior = early. The issue is happening against SoftHSMv2 using OpenSSL backend.

Find attached debug.log with default load behavior and failed operation. Though I have locally modified check out to fix #265.

fabled commented 1 year ago

Hum, seems this is somewhat known issue as the digest tests does exactly this to not fail. See https://github.com/latchset/pkcs11-provider/blob/main/tests/tdigest#L7-L11

fabled commented 1 year ago

So now that I am trying to get the test suite working with softhsm built to use openssl... I'm seeing random failures with pkcs11-module-load-behavior = early. I suppose this scenario is more or less the reason why the delayed initialization was added?

simo5 commented 1 year ago

NOt really, delayed initialization was added to prevent loading modules just because openssl asked for names of digests but never actually uses the module. What kind of issues do you see with softhsm and what version do you have?

Softhsm has the very bad issue that it does not use a libctx, instead uses the same default openssl context as the calling application, which can cause issues, especially at process shutdown.

To alleviate some of the issues at shutdown we have: pkcs11-module-quirks = no-deinit

fabled commented 1 year ago

Yes, I have that quirk enabled, and it fixed half the issues. But I'm also seeing various different random crashes on multi threaded programs, such as:

D:keyfromlabel:--------------------------------------------------------------------------------
D:/oss/bind9/bin/tests/system/keyfromlabel:Core was generated by `/oss/bind9/bin/dnssec/.libs/dnssec-signzone -v 1 -E provider -S -a -g -o rsasha'.
D:/oss/bind9/bin/tests/system/keyfromlabel:Program terminated with signal SIGSEGV, Segmentation fault.
D:/oss/bind9/bin/tests/system/keyfromlabel:#0  __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x0) at ./nptl/pthread_rwlock_common.c:603
D:/oss/bind9/bin/tests/system/keyfromlabel:[Current thread is 1 (Thread 0x7fe92d33d680 (LWP 17157))]
D:/oss/bind9/bin/tests/system/keyfromlabel:#0  __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x0) at ./nptl/pthread_rwlock_common.c:603
D:/oss/bind9/bin/tests/system/keyfromlabel:#1  ___pthread_rwlock_wrlock (rwlock=0x0) at ./nptl/pthread_rwlock_wrlock.c:26
D:/oss/bind9/bin/tests/system/keyfromlabel:#2  0x00007fe9314a6889 in CRYPTO_THREAD_write_lock () from /lib/x86_64-linux-gnu/libcrypto.so.3
D:/oss/bind9/bin/tests/system/keyfromlabel:#3  0x00007fe9314d41e3 in RAND_get_rand_method () from /lib/x86_64-linux-gnu/libcrypto.so.3
D:/oss/bind9/bin/tests/system/keyfromlabel:#4  0x00007fe9314d4a19 in RAND_bytes_ex () from /lib/x86_64-linux-gnu/libcrypto.so.3
D:/oss/bind9/bin/tests/system/keyfromlabel:#5  0x00007fe92eacc735 in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#6  0x00007fe92ead5a63 in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#7  0x00007fe92eaf1046 in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#8  0x00007fe92ea91321 in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#9  0x00007fe92ea9abfa in ?? () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#10 0x00007fe92ea6cb44 in C_SignInit () from /usr/lib/softhsm/libsofthsm2.so
D:/oss/bind9/bin/tests/system/keyfromlabel:#11 0x00007fe92eb3e8f2 in p11prov_SignInit (ctx=0x562cac127610, hSession=3, pMechanism=0x7fe92d3364c0, hKey=8) at /usr/src/pkcs11-provider-e763a88ecdb9fcdbae2d52ffe981272e6ca15294/src/interface.gen.c:654
D:/oss/bind9/bin/tests/system/keyfromlabel:#12 0x00007fe92eb51755 in p11prov_sig_operate_init (sigctx=sigctx@entry=0x7fe9280010c0, digest_op=digest_op@entry=false, _session=0x7fe92d336598) at signature.c:818
D:/oss/bind9/bin/tests/system/keyfromlabel:#13 0x00007fe92eb51d77 in p11prov_sig_operate (sigctx=0x7fe9280010c0, sig=0x7fe928000bb0 "", siglen=0x7fe92d3366c8, sigsize=<optimized out>, tbs=0x7fe92d3365a0 "010\r\006\t`\206H\001e\003\004\002\001\005", tbslen=51) at signature.c:904
D:/oss/bind9/bin/tests/system/keyfromlabel:#14 0x00007fe92eb5206c in p11prov_rsasig_sign (ctx=0x7fe9280010c0, sig=0x7fe928000bb0 "", siglen=0x7fe92d3366c8, sigsize=256, tbs=0x7fe92d3366d0 "\314\035 D\377{\237N\234kz:2\n\304\004\376\262\200\206a\370y0\a\302\325v\027\001z\357Ph3-\351\177", tbslen=32) at signature.c:1131
D:/oss/bind9/bin/tests/system/keyfromlabel:#15 0x00007fe9314863d1 in EVP_SignFinal_ex () from /lib/x86_64-linux-gnu/libcrypto.so.3
D:/oss/bind9/bin/tests/system/keyfromlabel:#16 0x00007fe931a51264 in opensslrsa_sign (dctx=0x7fe928000f20, sig=0x7fe92d336850) at opensslrsa_link.c:268
D:/oss/bind9/bin/tests/system/keyfromlabel:#17 0x00007fe931a040cc in dns_dnssec_sign (name=name@entry=0x7fe92d339270, set=set@entry=0x7fe92d337d70, key=key@entry=0x562cac180960, inception=inception@entry=0x562cab058cb4 <starttime>, expire=expire@entry=0x7fe92d336f0c, mctx=0x562cac167990, buffer=0x7fe92d336f40, sigrdata=0x7fe92d336f10) at dnssec.c:348
D:/oss/bind9/bin/tests/system/keyfromlabel:#18 0x0000562cab04b6d0 in signwithkey (name=name@entry=0x7fe92d339270, rdataset=rdataset@entry=0x7fe92d337d70, key=0x562cac180960, ttl=ttl@entry=300, add=add@entry=0x7fe92d337d20, logmsg=logmsg@entry=0x562cab0535c8 "signing with dnskey") at dnssec-signzone.c:301
D:/oss/bind9/bin/tests/system/keyfromlabel:#19 0x0000562cab04df03 in signset (set=0x7fe92d337d70, name=0x7fe92d339270, node=0x562cac17adc0, add=0x7fe92d337d20, del=0x7fe92d337d00) at dnssec-signzone.c:778
D:/oss/bind9/bin/tests/system/keyfromlabel:#20 signname (node=0x562cac17adc0, name=0x7fe92d339270) at dnssec-signzone.c:1222
D:/oss/bind9/bin/tests/system/keyfromlabel:#21 0x0000562cab0500c8 in assignwork (arg=<optimized out>) at dnssec-signzone.c:1668
D:/oss/bind9/bin/tests/system/keyfromlabel:#22 0x00007fe931bce727 in isc__async_cb (handle=<optimized out>) at async.c:111
D:/oss/bind9/bin/tests/system/keyfromlabel:#23 0x00007fe9317a409d in ?? () from /lib/x86_64-linux-gnu/libuv.so.1
D:/oss/bind9/bin/tests/system/keyfromlabel:#24 0x00007fe9317b7e3c in ?? () from /lib/x86_64-linux-gnu/libuv.so.1
D:/oss/bind9/bin/tests/system/keyfromlabel:#25 0x00007fe9317a49e4 in uv_run () from /lib/x86_64-linux-gnu/libuv.so.1
D:/oss/bind9/bin/tests/system/keyfromlabel:#26 0x00007fe931bdfbec in loop_thread (arg=arg@entry=0x562cac168fc0) at loop.c:282
D:/oss/bind9/bin/tests/system/keyfromlabel:#27 0x00007fe931bedf48 in thread_body (wrap=0x562cac193990) at thread.c:85
D:/oss/bind9/bin/tests/system/keyfromlabel:#28 thread_run (wrap=0x562cac193990) at thread.c:100
D:/oss/bind9/bin/tests/system/keyfromlabel:#29 0x00007fe93184e044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
D:/oss/bind9/bin/tests/system/keyfromlabel:#30 0x00007fe9318cd860 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
D:keyfromlabel:--------------------------------------------------------------------------------

Running in valgrind does not help as the problem disappears as a side effect.

simo5 commented 1 year ago

I have not seen this before, but I wonder if it could be a nested lock or something causing openssl called from softhsm to mess with structures that are not fully set up yet?

fabled commented 1 year ago

Yes something funny happens in openssl. I suspect the app -> openssl init -> provider -> softhsm -> openssl init causes openssl init to not setup everything correctly.

So on softhsm, for me: with early init, I get this and with default init the digest stuff is not working.

I need to figure out if I can fix openssl / determine root cause of this. Or if should just try to do softhsm built with botan, or move to nss soft token.

simo5 commented 1 year ago

What is the first operation you attempt in Bind?

fabled commented 1 year ago

Actually I need to investigate more. Seems the crash is sort of valid, as other thread had failed operation and was doing openssl deinit which causes the crash :-o

The first failure is:

dnssec-signzone: warning: EVP_SignFinal (opensslrsa_link.c:269) failed (failure)
dnssec-signzone: info: error:4080000A:pkcs11::Internal locking failure:session.c:621:Failed to fetch a login_session
dnssec-signzone: info: error:4080000A:pkcs11::Internal locking failure:signature.c:837:Failed to open session on slot 1007647743
dnssec-signzone: fatal: dnskey 'rsasha512.example/RSASHA512/53583' failed to sign data: failure

I can reproduce this only with the early init mode, because the code does not get to doing the signing in default load mode.

Making pkcs11-provider do debugging makes seems to mostly make this issue disappear. However, find attached debug.log of this happening (though it seems garbled due to multiple thread usage).

fabled commented 1 year ago

It seems that since commit 6403f223 fetch_session will immediately fail if two threads are concurrently doing the first operation on a slot that would need login. Probably triggers in my case since it's a simple command line executable that just loads one private key on token, and starts ncores threads and does sign operations on each using the same key.

ifranzki commented 7 months ago

It is quite possible that this issue (at least the original dnssec-signzone: info: error:03000096:digital envelope routines::operation not supported for this keytype:crypto/evp/signature.c:579:) is fixed with the fix from https://github.com/latchset/pkcs11-provider/pull/372.

The place where this error is issued is in evp_pkey_signature_init() in the legacy code path. The legacy code path is reached if either the PKEY is a legacy PKEY (unlikely in this case), or if it tried to find SIGNATURE method from a provider, and it failed to find one. To find a SIGNATURE method it first fetches it based on the property query. If this is from a different provider than the one that was used to create the PKEY, then it is tries to export the key material from the original provider and import it into the one from which the fetched SIGNATURE method is from. In case the PKEY was built from a PKCS#11 URI with the pkcs11-provider, then the key is most likely not exportable, and the export fails. If this happens, it takes another round trying to fetch the SIGNATURE method from the provider that was used to create the PKEY, i.e. the pkcs11-provider in this case. This is then supposed to be able to work with that key.

If you run into the situation that the fix in https://github.com/latchset/pkcs11-provider/pull/372 is caring about, then that second try to fetch the SIGNATURE method from the pkcs11-provider that was used to create the PKEY will fail, because OpenSSl has queried the provider too early, when it has not been fully initialized. So it cached the information that the provider would not support signatures, and thus fetching the SIGNATURE method form the provider will not find any. This will result in dnssec-signzone: info: error:03000096:digital envelope routines::operation not supported for this keytype:crypto/evp/signature.c:579:.

simo5 commented 7 months ago

@fabled any chance you can re-test with latest code? There are at least 3 fixes that may improve the situation here, one on locking, one of session fetching, and one on operations caching (this may even eliminate the need for setting the "early" quirk for you and so it'd be nice if you try without it too.

The-Mule commented 6 months ago

Even though I am not the original reported, it seems that this is still not fixed even with current main branch (and openssl-3.2). However, the error is different now:

No self-signed KSK DNSKEY found
Zone verification failed (failure)

And it seems that provider has issues handling the key. When comparing the provider debug output with early and wihout it, it is basically equal until the following point (output without early):

[../src/keymgmt.c:676] p11prov_rsa_new(): rsa new
[../src/keymgmt.c:737] p11prov_rsa_import(): rsa import 0x7f470ee3f280
[../src/signature.c:1165] p11prov_rsasig_verify_init(): rsa verify init (ctx=0x7f470ee3f100, key=0x7f470ee3f280, params=(nil))
[../src/objects.c:402] p11prov_obj_ref_no_cache(): Ref Object: 0x7f470ee3f280 (handle:0)
[../src/provider.c:606] p11prov_ctx_cache_keys(): cache_keys = 1
[../src/signature.c:1460] p11prov_rsasig_set_ctx_params(): rsasig set ctx params (ctx=0x7f470ee3f100, params=(nil))
[../src/signature.c:1460] p11prov_rsasig_set_ctx_params(): rsasig set ctx params (ctx=0x7f470ee3f100, params=0x7fff34973230)
[../src/signature.c:1475] p11prov_rsasig_set_ctx_params(): Set OSSL_SIGNATURE_PARAM_DIGEST to SHA256
[../src/signature.c:1183] p11prov_rsasig_verify(): rsa verify (ctx=0x7f470ee3f100)
[../src/signature.c:784] p11prov_sig_operate_init(): called (sigctx=0x7f470ee3f100, digest_op=false)
[../src/signature.c:794] p11prov_sig_operate_init(): Error: 0x00000060; Provided key has invalid handle

I am not exactly sure why p11prov_rsa_new() is happening without early but this call never happens when the setting is enabled.

The-Mule commented 6 months ago

Actually there are more differences than that. With early function p11prov_rsa_export() is called multiple times during the dnssec-signzone while without early it is called just once... attaching both provider debug outputs.

with-early.txt without-early.txt

simo5 commented 2 months ago

@The-Mule do you know if this bug goes away when using kryoptic?