Open minfrin opened 3 years ago
Putting breakpoints onto the constructor and destructor of the mutexes, we get a double destructor call:
Breakpoint 1, Mutex::Mutex (this=0x555555799160) at MutexFactory.cpp:44
44 Mutex::Mutex()
Missing separate debuginfos, use: yum debuginfo-install cyrus-sasl-lib-2.1.27-5.el8.x86_64 expat-2.2.5-4.el8.x86_64 keyutils-libs-1.5.10-6.el8.x86_64 krb5-libs-1.18.2-5.el8.x86_64 libcom_err-1.45.6-1.el8.x86_64 libffi-3.1-22.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libselinux-2.9-4.el8_3.x86_64 libstdc++-8.3.1-5.1.el8.x86_64 libtasn1-4.13-3.el8.x86_64 libuuid-2.32.1-24.el8.x86_64 libxcrypt-4.1.1-4.el8.x86_64 libxml2-2.9.7-8.el8.x86_64 nspr-4.25.0-2.el8_2.x86_64 nss-3.53.1-17.el8_3.x86_64 nss-softokn-3.53.1-17.el8_3.x86_64 nss-softokn-freebl-3.53.1-17.el8_3.x86_64 nss-util-3.53.1-17.el8_3.x86_64 openldap-2.4.46-15.el8.x86_64 openssl-libs-1.1.1g-15.el8_3.x86_64 p11-kit-0.23.14-5.el8_0.x86_64 p11-kit-trust-0.23.14-5.el8_0.x86_64 pcre2-10.32-2.el8.x86_64 sqlite-libs-3.26.0-11.el8.x86_64 zlib-1.2.11-16.2.el8_3.x86_64
(gdb) b Mutex::~Mutex
Breakpoint 2 at 0x7ffff1b9423c: Mutex::~Mutex. (2 locations)
(gdb) cont
Continuing.
Breakpoint 1, Mutex::Mutex (this=0x555555969f70) at MutexFactory.cpp:44
44 Mutex::Mutex()
(gdb) cont
Continuing.
Breakpoint 1, Mutex::Mutex (this=0x55555596a1a0) at MutexFactory.cpp:44
44 Mutex::Mutex()
(gdb) cont
Continuing.
Breakpoint 1, Mutex::Mutex (this=0x55555596a290) at MutexFactory.cpp:44
44 Mutex::Mutex()
(gdb) cont
Continuing.
Breakpoint 1, Mutex::Mutex (this=0x55555596a2e0) at MutexFactory.cpp:44
44 Mutex::Mutex()
(gdb) cont
Continuing.
Breakpoint 2, Mutex::~Mutex (this=0x55555596a2e0, __in_chrg=<optimized out>) at MutexFactory.cpp:56
56 }
(gdb) cont
Continuing.
Breakpoint 2, Mutex::~Mutex (this=0x55555596a2e0, __in_chrg=<optimized out>) at MutexFactory.cpp:50
50 Mutex::~Mutex()
(gdb) cont
Continuing.
Breakpoint 1, Mutex::Mutex (this=0x5555559b8d50) at MutexFactory.cpp:44
44 Mutex::Mutex()
(gdb) cont
Continuing.
Breakpoint 1, Mutex::Mutex (this=0x5555559b8eb0) at MutexFactory.cpp:44
44 Mutex::Mutex()
(gdb) cont
Continuing.
Breakpoint 1, Mutex::Mutex (this=0x5555559b8f00) at MutexFactory.cpp:44
44 Mutex::Mutex()
(gdb) cont
Continuing.
nss-out: intermediate: CN=Sectigo RSA Extended Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater Manchester,C=GB
nss-out: intermediate: CN=USERTrust RSA Certification Authority,O=The USERTRUST Network,L=Jersey City,ST=New Jersey,C=US
nss-out: intermediate: CN=AAA Certificate Services,O=Comodo CA Limited,L=Salford,ST=Greater Manchester,C=GB
nss-out: intermediate: CN=USERTrust RSA Certification Authority,O=The USERTRUST Network,L=Jersey City,ST=New Jersey,C=US
nss-out: intermediate: CN=AddTrust External CA Root,OU=AddTrust External TTP Network,O=AddTrust AB,C=SE
nss-out: intermediate: CN=USERTrust RSA Certification Authority,O=The USERTRUST Network,L=Jersey City,ST=New Jersey,C=US
nss-out: intermediate: CN=DigiCert SHA2 Extended Validation Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US
nss-out: intermediate: CN=DigiCert High Assurance EV Root CA,OU=www.digicert.com,O=DigiCert Inc,C=US
Breakpoint 2, Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
56 }
(gdb) cont
Continuing.
Breakpoint 2, Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:50
50 Mutex::~Mutex()
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff2064700 in ?? ()
This may be fixed in the develop branch (code is not in a release yet), to check: is this with a released version or the most recent develop branch?
Can you point to the patch that fixes this if possible?
I'll need to do a significant hack job on the machine to get the RPM out and the dev code in, this is not easy to do.
I'm not currently working on the softhsm code, but rather just the presence of softhsm causes unrelated code to crash.
I think the patches you need are in #550 and #551
The RPM as shipped by Redhat includes the following patch, which looks very similar to the patches you've listed above.
The crash however is happening in NSS code, not OpenSSL - I suspect whatever problem was triggering the crash in OpenSSL (and subsequently fixed) is also triggering a crash in NSS.
[minfrin@localhost SPECS]$ cat ../SOURCES/softhsm-2.6.1-rh1831086-exit.patch
diff --git a/src/lib/crypto/OSSLCryptoFactory.cpp b/src/lib/crypto/OSSLCryptoFactory.cpp
index 32daca2..ace4bcb 100644
--- a/src/lib/crypto/OSSLCryptoFactory.cpp
+++ b/src/lib/crypto/OSSLCryptoFactory.cpp
@@ -226,31 +226,49 @@ err:
// Destructor
OSSLCryptoFactory::~OSSLCryptoFactory()
{
-#ifdef WITH_GOST
- // Finish the GOST engine
- if (eg != NULL)
+ bool ossl_shutdown = false;
+
+#if OPENSSL_VERSION_NUMBER >= 0x10100000L && !defined(LIBRESSL_VERSION_NUMBER)
+ // OpenSSL 1.1.0+ will register an atexit() handler to run
+ // OPENSSL_cleanup(). If that has already happened we must
+ // not attempt to free any ENGINEs because they'll already
+ // have been destroyed and the use-after-free would cause
+ // a deadlock or crash.
+ //
+ // Detect that situation because reinitialisation will fail
+ // after OPENSSL_cleanup() has run.
+ (void)ERR_set_mark();
+ ossl_shutdown = !OPENSSL_init_crypto(OPENSSL_INIT_ENGINE_RDRAND, NULL);
+ (void)ERR_pop_to_mark();
+#endif
+ if (!ossl_shutdown)
{
- ENGINE_finish(eg);
- ENGINE_free(eg);
- eg = NULL;
- }
+#ifdef WITH_GOST
+ // Finish the GOST engine
+ if (eg != NULL)
+ {
+ ENGINE_finish(eg);
+ ENGINE_free(eg);
+ eg = NULL;
+ }
#endif
- // Finish the rd_rand engine
- ENGINE_finish(rdrand_engine);
- ENGINE_free(rdrand_engine);
- rdrand_engine = NULL;
+ // Finish the rd_rand engine
+ ENGINE_finish(rdrand_engine);
+ ENGINE_free(rdrand_engine);
+ rdrand_engine = NULL;
+ // Recycle locks
+#if OPENSSL_VERSION_NUMBER < 0x10100000L || defined(LIBRESSL_VERSION_NUMBER)
+ if (setLockingCallback)
+ {
+ CRYPTO_set_locking_callback(NULL);
+ }
+#endif
+ }
// Destroy the one-and-only RNG
delete rng;
- // Recycle locks
-#if OPENSSL_VERSION_NUMBER < 0x10100000L || defined(LIBRESSL_VERSION_NUMBER)
- if (setLockingCallback)
- {
- CRYPTO_set_locking_callback(NULL);
- }
-#endif
for (unsigned i = 0; i < nlocks; i++)
{
MutexFactory::i()->recycleMutex(locks[i]);
Looking in more detail, there seems to be a pattern of memory management where variables are being manually created with new and manually destroyed with delete inside destructors. It then in turn looks like objects are being shallow copied, and then c++ is trying to auto-destruct the copy in atexit(), which has the effect of attempting to delete the referenced objects a second time. The double free then triggers a crash.
What makes this problem really serious is that on many (all?) Linux distros pk11-kit is auto-wired into NSS, and softhsm is auto-wired into p11-kit.
As a result, the simple act of installing the softhsm driver is enough to crash an application that uses NSS, even when that application that uses NSS makes no attempt to know or care about the softhsm.
Can you print a stack trace for the double destructor call you saw earlier on? (so do bt
in both cases)
The two stacktraces are below.
Breakpoint 2, Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
56 }
(gdb) bt
#0 Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
#1 0x00007ffff1b94553 in MutexFactory::recycleMutex (this=0x5555559637d0, mutex=0x5555559b8f00) at MutexFactory.cpp:131
#2 0x00007ffff1bb8957 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:60
#3 0x00007ffff1bb8992 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:61
#4 0x00007ffff1b5a9f9 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:393
#5 0x00007ffff1b5ab82 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
#6 0x00007ffff1b8ad88 in std::default_delete<SoftHSM>::operator() (this=0x7ffff1e17340 <SoftHSM::instance>, __ptr=0x555555799180)
at /usr/include/c++/8/bits/unique_ptr.h:81
#7 0x00007ffff1b8a16f in std::unique_ptr<SoftHSM, std::default_delete<SoftHSM> >::~unique_ptr (this=0x7ffff1e17340 <SoftHSM::instance>,
__in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:269
#8 0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#9 0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#10 0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235
(gdb) cont
Continuing.
Breakpoint 2, Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:50
50 Mutex::~Mutex()
(gdb) bt
#0 Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:50
#1 0x00007ffff1b9428c in Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
#2 0x00007ffff1b94553 in MutexFactory::recycleMutex (this=0x5555559637d0, mutex=0x5555559b8f00) at MutexFactory.cpp:131
#3 0x00007ffff1bb8957 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:60
#4 0x00007ffff1bb8992 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:61
#5 0x00007ffff1b5a9f9 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:393
#6 0x00007ffff1b5ab82 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
#7 0x00007ffff1b8ad88 in std::default_delete<SoftHSM>::operator() (this=0x7ffff1e17340 <SoftHSM::instance>, __ptr=0x555555799180)
at /usr/include/c++/8/bits/unique_ptr.h:81
#8 0x00007ffff1b8a16f in std::unique_ptr<SoftHSM, std::default_delete<SoftHSM> >::~unique_ptr (this=0x7ffff1e17340 <SoftHSM::instance>,
__in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:269
#9 0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#10 0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#11 0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff2064700 in ?? ()
(gdb) bt
#0 0x00007ffff2064700 in ?? ()
#1 0x00007ffff1b9465f in MutexFactory::DestroyMutex (this=0x5555559637d0, mutex=0x5555559b8f20) at MutexFactory.cpp:176
#2 0x00007ffff1b94271 in Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:54
#3 0x00007ffff1b9428c in Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
#4 0x00007ffff1b94553 in MutexFactory::recycleMutex (this=0x5555559637d0, mutex=0x5555559b8f00) at MutexFactory.cpp:131
#5 0x00007ffff1bb8957 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:60
#6 0x00007ffff1bb8992 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:61
#7 0x00007ffff1b5a9f9 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:393
#8 0x00007ffff1b5ab82 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
#9 0x00007ffff1b8ad88 in std::default_delete<SoftHSM>::operator() (this=0x7ffff1e17340 <SoftHSM::instance>, __ptr=0x555555799180)
at /usr/include/c++/8/bits/unique_ptr.h:81
#10 0x00007ffff1b8a16f in std::unique_ptr<SoftHSM, std::default_delete<SoftHSM> >::~unique_ptr (this=0x7ffff1e17340 <SoftHSM::instance>,
__in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:269
#11 0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#12 0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#13 0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235
(gdb) quit
Looking closer, it looks like we're stopping twice in the destructor.
Another question to ask - why is the SoftHSM object still alive and wired into atexit? Once C_FInalize is called, SoftHSM should be completely gone.
Placing a breakpoint on the SoftHSM destructor, we see the object is still alive during atexit:
Breakpoint 1, SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
410 }
Missing separate debuginfos, use: yum debuginfo-install cyrus-sasl-lib-2.1.27-5.el8.x86_64 expat-2.2.5-4.el8.x86_64 keyutils-libs-1.5.10-6.el8.x86_64 krb5-libs-1.18.2-5.el8.x86_64 libcom_err-1.45.6-1.el8.x86_64 libffi-3.1-22.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libselinux-2.9-4.el8_3.x86_64 libstdc++-8.3.1-5.1.el8.x86_64 libtasn1-4.13-3.el8.x86_64 libuuid-2.32.1-24.el8.x86_64 libxcrypt-4.1.1-4.el8.x86_64 libxml2-2.9.7-8.el8.x86_64 nspr-4.25.0-2.el8_2.x86_64 nss-3.53.1-17.el8_3.x86_64 nss-softokn-3.53.1-17.el8_3.x86_64 nss-softokn-freebl-3.53.1-17.el8_3.x86_64 nss-util-3.53.1-17.el8_3.x86_64 openldap-2.4.46-15.el8.x86_64 openssl-libs-1.1.1g-15.el8_3.x86_64 p11-kit-0.23.14-5.el8_0.x86_64 p11-kit-trust-0.23.14-5.el8_0.x86_64 pcre2-10.32-2.el8.x86_64 sqlite-libs-3.26.0-11.el8.x86_64 zlib-1.2.11-16.2.el8_3.x86_64
(gdb) bt
#0 SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
#1 0x00007ffff1b8ad88 in std::default_delete<SoftHSM>::operator() (this=0x7ffff1e17340 <SoftHSM::instance>, __ptr=0x555555799180)
at /usr/include/c++/8/bits/unique_ptr.h:81
#2 0x00007ffff1b8a16f in std::unique_ptr<SoftHSM, std::default_delete<SoftHSM> >::~unique_ptr (this=0x7ffff1e17340 <SoftHSM::instance>,
__in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:269
#3 0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#4 0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#5 0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235
More digging.
I've discovered that in this case NSS was calling C_Initialize, but was not calling C_Finalize.
The reason was the presence of the NSS_INIT_COOPERATE flag on it, which sets the NSS_INIT_NOPK11FINALIZE flag, which tells NSS not to run C_Finalize on anything it called C_Initialize on.
Removing the NSS_INIT_COOPERATE flag when initialising avoids the crash.
It looks like the destructors in softhsm shut down cleanly when C_Finalize is called, but if not called, the finaliser in the SoftHSM object causes softhsm to crash.
I use the OpenSSL 3 PKCS11 provider and everything works great until the end of the process. I get SIGSEGV where it is trying to destruct. I'm currently using SoftHSM2 with Botan because it doesn't work well with openssl 3. Any updates on this issue? I was able to use lldb and trace it down to calling destructors on SoftHSM instance in C++. Maybe they are getting called more than one time?
I have code that calls NSS on EL8.
When softhsm is installed and present, the code crashes on shutdown in the atexit handler as follows:
Placing a breakpoint on the destructor, we see the destructor is called twice.
The first time the destructor is called is during initialisation of NSS:
The second time the destructor is called, is during atexit() shutdown:
Digging into the MutexFactory we get this:
Any ideas so far?