SIGSEGV: NSS + SoftHSMv2 == crash during atexit()

minfrin commented 3 years ago

I have code that calls NSS on EL8.

When softhsm is installed and present, the code crashes on shutdown in the atexit handler as follows:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff2064700 in ?? ()
Missing separate debuginfos, use: yum debuginfo-install cyrus-sasl-lib-2.1.27-5.el8.x86_64 expat-2.2.5-4.el8.x86_64 keyutils-libs-1.5.10-6.el8.x86_64 krb5-libs-1.18.2-5.el8.x86_64 libcom_err-1.45.6-1.el8.x86_64 libffi-3.1-22.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libselinux-2.9-4.el8_3.x86_64 libstdc++-8.3.1-5.1.el8.x86_64 libtasn1-4.13-3.el8.x86_64 libuuid-2.32.1-24.el8.x86_64 libxcrypt-4.1.1-4.el8.x86_64 libxml2-2.9.7-8.el8.x86_64 nspr-4.25.0-2.el8_2.x86_64 nss-3.53.1-17.el8_3.x86_64 nss-softokn-3.53.1-17.el8_3.x86_64 nss-softokn-freebl-3.53.1-17.el8_3.x86_64 nss-util-3.53.1-17.el8_3.x86_64 openldap-2.4.46-15.el8.x86_64 openssl-libs-1.1.1g-15.el8_3.x86_64 p11-kit-0.23.14-5.el8_0.x86_64 p11-kit-trust-0.23.14-5.el8_0.x86_64 pcre2-10.32-2.el8.x86_64 sqlite-libs-3.26.0-11.el8.x86_64 zlib-1.2.11-16.2.el8_3.x86_64
(gdb) bt
#0  0x00007ffff2064700 in ?? ()
#1  0x00007ffff1bac576 in MutexFactory::DestroyMutex (mutex=<optimized out>, this=<optimized out>) at MutexFactory.cpp:176
#2  MutexFactory::DestroyMutex (mutex=<optimized out>, this=<optimized out>) at MutexFactory.cpp:172
#3  Mutex::~Mutex (this=0x5555559b8f50, __in_chrg=<optimized out>) at MutexFactory.cpp:54
#4  0x00007ffff1bac5d5 in Mutex::~Mutex (this=0x5555559b8f50, __in_chrg=<optimized out>) at MutexFactory.cpp:52
#5  Mutex::~Mutex (this=0x5555559b8f50, __in_chrg=<optimized out>) at MutexFactory.cpp:56
#6  0x00007ffff1bcda23 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:60
#7  0x00007ffff1bcda4d in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:57
#8  0x00007ffff1b8971a in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:393
#9  0x00007ffff1b8982d in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:391
#10 0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#11 0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#12 0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235
(gdb) up
#1  0x00007ffff1bac576 in MutexFactory::DestroyMutex (mutex=<optimized out>, this=<optimized out>) at MutexFactory.cpp:176
176     return (this->destroyMutex)(mutex);

Placing a breakpoint on the destructor, we see the destructor is called twice.

The first time the destructor is called is during initialisation of NSS:

Breakpoint 1, MutexFactory::DestroyMutex (this=0x555555963a60, mutex=0x5555559b6e90) at MutexFactory.cpp:174
174     if (!enabled) return CKR_OK;
Missing separate debuginfos, use: yum debuginfo-install cyrus-sasl-lib-2.1.27-5.el8.x86_64 expat-2.2.5-4.el8.x86_64 keyutils-libs-1.5.10-6.el8.x86_64 krb5-libs-1.18.2-5.el8.x86_64 libcom_err-1.45.6-1.el8.x86_64 libffi-3.1-22.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libselinux-2.9-4.el8_3.x86_64 libstdc++-8.3.1-5.1.el8.x86_64 libtasn1-4.13-3.el8.x86_64 libuuid-2.32.1-24.el8.x86_64 libxcrypt-4.1.1-4.el8.x86_64 libxml2-2.9.7-8.el8.x86_64 nspr-4.25.0-2.el8_2.x86_64 nss-3.53.1-17.el8_3.x86_64 nss-softokn-3.53.1-17.el8_3.x86_64 nss-softokn-freebl-3.53.1-17.el8_3.x86_64 nss-util-3.53.1-17.el8_3.x86_64 openldap-2.4.46-15.el8.x86_64 openssl-libs-1.1.1g-15.el8_3.x86_64 p11-kit-0.23.14-5.el8_0.x86_64 p11-kit-trust-0.23.14-5.el8_0.x86_64 pcre2-10.32-2.el8.x86_64 sqlite-libs-3.26.0-11.el8.x86_64 zlib-1.2.11-16.2.el8_3.x86_64
(gdb) bt
#0  MutexFactory::DestroyMutex (this=0x555555963a60, mutex=0x5555559b6e90) at MutexFactory.cpp:174
#1  0x00007ffff1b94271 in Mutex::~Mutex (this=0x55555596a570, __in_chrg=<optimized out>) at MutexFactory.cpp:54
#2  0x00007ffff1b9428c in Mutex::~Mutex (this=0x55555596a570, __in_chrg=<optimized out>) at MutexFactory.cpp:56
#3  0x00007ffff1b94553 in MutexFactory::recycleMutex (this=0x555555963a60, mutex=0x55555596a570) at MutexFactory.cpp:131
#4  0x00007ffff1bbd095 in Directory::~Directory (this=0x7fffffffd380, __in_chrg=<optimized out>) at Directory.cpp:64
#5  0x00007ffff1bbb194 in ObjectStore::ObjectStore (this=0x55555596a480, inStorePath="/var/lib/softhsm/tokens/") at ObjectStore.cpp:65
#6  0x00007ffff1b5b155 in SoftHSM::C_Initialize (this=0x555555799180, pInitArgs=0x555555965f70) at SoftHSM.cpp:548
#7  0x00007ffff1b3d7b0 in C_Initialize (pInitArgs=0x555555965f70) at main.cpp:133
#8  0x00007ffff2063b05 in initialize_module_inlock_reentrant () from /usr/lib64/p11-kit-proxy.so
#9  0x00007ffff2063cc3 in managed_C_Initialize () from /usr/lib64/p11-kit-proxy.so
#10 0x00007ffff20667b0 in p11_kit_modules_initialize () from /usr/lib64/p11-kit-proxy.so
#11 0x00007ffff204d05f in proxy_C_Initialize () from /usr/lib64/p11-kit-proxy.so
#12 0x00007ffff61a30f2 in secmod_ModuleInit () from /usr/lib64/libnss3.so
#13 0x00007ffff61a386a in secmod_LoadPKCS11Module () from /usr/lib64/libnss3.so
#14 0x00007ffff61b17ad in SECMOD_LoadModule () from /usr/lib64/libnss3.so
#15 0x00007ffff61b18e8 in SECMOD_LoadModule () from /usr/lib64/libnss3.so
#16 0x00007ffff61781cc in nss_Init () from /usr/lib64/libnss3.so
#17 0x00007ffff6178722 in NSS_InitContext () from /usr/lib64/libnss3.so
#18 0x00005555555646e6 in redwax_nss_process_nss_out (r=0x7fffffffe000, file=0x7fffffffe5c1 "/tmp/slapd-gatekeeper/", sname=0x0, 
    secret=0x0) at redwax_nss.c:451

The second time the destructor is called, is during atexit() shutdown:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff2064700 in ?? ()
(gdb) bt
#0  0x00007ffff2064700 in ?? ()
#1  0x00007ffff1b9465f in MutexFactory::DestroyMutex (this=0x555555963a60, mutex=0x5555559b91b0) at MutexFactory.cpp:176
#2  0x00007ffff1b94271 in Mutex::~Mutex (this=0x5555559b9190, __in_chrg=<optimized out>) at MutexFactory.cpp:54
#3  0x00007ffff1b9428c in Mutex::~Mutex (this=0x5555559b9190, __in_chrg=<optimized out>) at MutexFactory.cpp:56
#4  0x00007ffff1b94553 in MutexFactory::recycleMutex (this=0x555555963a60, mutex=0x5555559b9190) at MutexFactory.cpp:131
#5  0x00007ffff1bb8957 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:60
#6  0x00007ffff1bb8992 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:61
#7  0x00007ffff1b5a9f9 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:393
#8  0x00007ffff1b5ab82 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
#9  0x00007ffff1b8ad88 in std::default_delete<SoftHSM>::operator() (this=0x7ffff1e17340 <SoftHSM::instance>, __ptr=0x555555799180)
    at /usr/include/c++/8/bits/unique_ptr.h:81
#10 0x00007ffff1b8a16f in std::unique_ptr<SoftHSM, std::default_delete<SoftHSM> >::~unique_ptr (this=0x7ffff1e17340 <SoftHSM::instance>, 
    __in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:269
#11 0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#12 0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#13 0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235

Digging into the MutexFactory we get this:

(gdb) print *this
$1 = {_vptr.MutexFactory = 0x7ffff1e10c38 <vtable for MutexFactory+16>, static instance = {_M_t = {_M_t = std::tuple containing = {
        [1] = 0x555555963a60, [2] = {<std::default_delete<MutexFactory>> = {<No data fields>}, <No data fields>}}}}, 
  createMutex = 0x7ffff20647f0, destroyMutex = 0x7ffff2064700, lockMutex = 0x7ffff2064360, unlockMutex = 0x7ffff2064310, enabled = true}
(gdb) print mutex
$2 = (CK_VOID_PTR) 0x5555559b91b0

Any ideas so far?

minfrin commented 3 years ago

Putting breakpoints onto the constructor and destructor of the mutexes, we get a double destructor call:

Breakpoint 1, Mutex::Mutex (this=0x555555799160) at MutexFactory.cpp:44
44  Mutex::Mutex()
Missing separate debuginfos, use: yum debuginfo-install cyrus-sasl-lib-2.1.27-5.el8.x86_64 expat-2.2.5-4.el8.x86_64 keyutils-libs-1.5.10-6.el8.x86_64 krb5-libs-1.18.2-5.el8.x86_64 libcom_err-1.45.6-1.el8.x86_64 libffi-3.1-22.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libselinux-2.9-4.el8_3.x86_64 libstdc++-8.3.1-5.1.el8.x86_64 libtasn1-4.13-3.el8.x86_64 libuuid-2.32.1-24.el8.x86_64 libxcrypt-4.1.1-4.el8.x86_64 libxml2-2.9.7-8.el8.x86_64 nspr-4.25.0-2.el8_2.x86_64 nss-3.53.1-17.el8_3.x86_64 nss-softokn-3.53.1-17.el8_3.x86_64 nss-softokn-freebl-3.53.1-17.el8_3.x86_64 nss-util-3.53.1-17.el8_3.x86_64 openldap-2.4.46-15.el8.x86_64 openssl-libs-1.1.1g-15.el8_3.x86_64 p11-kit-0.23.14-5.el8_0.x86_64 p11-kit-trust-0.23.14-5.el8_0.x86_64 pcre2-10.32-2.el8.x86_64 sqlite-libs-3.26.0-11.el8.x86_64 zlib-1.2.11-16.2.el8_3.x86_64
(gdb) b Mutex::~Mutex
Breakpoint 2 at 0x7ffff1b9423c: Mutex::~Mutex. (2 locations)
(gdb) cont
Continuing.

Breakpoint 1, Mutex::Mutex (this=0x555555969f70) at MutexFactory.cpp:44
44  Mutex::Mutex()
(gdb) cont
Continuing.

Breakpoint 1, Mutex::Mutex (this=0x55555596a1a0) at MutexFactory.cpp:44
44  Mutex::Mutex()
(gdb) cont
Continuing.

Breakpoint 1, Mutex::Mutex (this=0x55555596a290) at MutexFactory.cpp:44
44  Mutex::Mutex()
(gdb) cont
Continuing.

Breakpoint 1, Mutex::Mutex (this=0x55555596a2e0) at MutexFactory.cpp:44
44  Mutex::Mutex()
(gdb) cont
Continuing.

Breakpoint 2, Mutex::~Mutex (this=0x55555596a2e0, __in_chrg=<optimized out>) at MutexFactory.cpp:56
56  }
(gdb) cont
Continuing.

Breakpoint 2, Mutex::~Mutex (this=0x55555596a2e0, __in_chrg=<optimized out>) at MutexFactory.cpp:50
50  Mutex::~Mutex()
(gdb) cont
Continuing.

Breakpoint 1, Mutex::Mutex (this=0x5555559b8d50) at MutexFactory.cpp:44
44  Mutex::Mutex()
(gdb) cont
Continuing.

Breakpoint 1, Mutex::Mutex (this=0x5555559b8eb0) at MutexFactory.cpp:44
44  Mutex::Mutex()
(gdb) cont
Continuing.

Breakpoint 1, Mutex::Mutex (this=0x5555559b8f00) at MutexFactory.cpp:44
44  Mutex::Mutex()
(gdb) cont
Continuing.
nss-out: intermediate: CN=Sectigo RSA Extended Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater Manchester,C=GB
nss-out: intermediate: CN=USERTrust RSA Certification Authority,O=The USERTRUST Network,L=Jersey City,ST=New Jersey,C=US
nss-out: intermediate: CN=AAA Certificate Services,O=Comodo CA Limited,L=Salford,ST=Greater Manchester,C=GB
nss-out: intermediate: CN=USERTrust RSA Certification Authority,O=The USERTRUST Network,L=Jersey City,ST=New Jersey,C=US
nss-out: intermediate: CN=AddTrust External CA Root,OU=AddTrust External TTP Network,O=AddTrust AB,C=SE
nss-out: intermediate: CN=USERTrust RSA Certification Authority,O=The USERTRUST Network,L=Jersey City,ST=New Jersey,C=US
nss-out: intermediate: CN=DigiCert SHA2 Extended Validation Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US
nss-out: intermediate: CN=DigiCert High Assurance EV Root CA,OU=www.digicert.com,O=DigiCert Inc,C=US

Breakpoint 2, Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
56  }
(gdb) cont
Continuing.

Breakpoint 2, Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:50
50  Mutex::~Mutex()
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff2064700 in ?? ()

rijswijk commented 3 years ago

This may be fixed in the develop branch (code is not in a release yet), to check: is this with a released version or the most recent develop branch?

minfrin commented 3 years ago

Can you point to the patch that fixes this if possible?

I'll need to do a significant hack job on the machine to get the RPM out and the dev code in, this is not easy to do.

I'm not currently working on the softhsm code, but rather just the presence of softhsm causes unrelated code to crash.

rijswijk commented 3 years ago

I think the patches you need are in #550 and #551

minfrin commented 3 years ago

The RPM as shipped by Redhat includes the following patch, which looks very similar to the patches you've listed above.

The crash however is happening in NSS code, not OpenSSL - I suspect whatever problem was triggering the crash in OpenSSL (and subsequently fixed) is also triggering a crash in NSS.

[minfrin@localhost SPECS]$ cat ../SOURCES/softhsm-2.6.1-rh1831086-exit.patch 
diff --git a/src/lib/crypto/OSSLCryptoFactory.cpp b/src/lib/crypto/OSSLCryptoFactory.cpp
index 32daca2..ace4bcb 100644
--- a/src/lib/crypto/OSSLCryptoFactory.cpp
+++ b/src/lib/crypto/OSSLCryptoFactory.cpp
@@ -226,31 +226,49 @@ err:
 // Destructor
 OSSLCryptoFactory::~OSSLCryptoFactory()
 {
-#ifdef WITH_GOST
-   // Finish the GOST engine
-   if (eg != NULL)
+   bool ossl_shutdown = false;
+
+#if OPENSSL_VERSION_NUMBER >= 0x10100000L && !defined(LIBRESSL_VERSION_NUMBER)
+   // OpenSSL 1.1.0+ will register an atexit() handler to run
+   // OPENSSL_cleanup(). If that has already happened we must
+   // not attempt to free any ENGINEs because they'll already
+   // have been destroyed and the use-after-free would cause
+   // a deadlock or crash.
+   //
+   // Detect that situation because reinitialisation will fail
+   // after OPENSSL_cleanup() has run.
+   (void)ERR_set_mark();
+   ossl_shutdown = !OPENSSL_init_crypto(OPENSSL_INIT_ENGINE_RDRAND, NULL);
+   (void)ERR_pop_to_mark();
+#endif
+   if (!ossl_shutdown)
    {
-       ENGINE_finish(eg);
-       ENGINE_free(eg);
-       eg = NULL;
-   }
+#ifdef WITH_GOST
+       // Finish the GOST engine
+       if (eg != NULL)
+       {
+           ENGINE_finish(eg);
+           ENGINE_free(eg);
+           eg = NULL;
+       }
 #endif

-   // Finish the rd_rand engine
-   ENGINE_finish(rdrand_engine);
-   ENGINE_free(rdrand_engine);
-   rdrand_engine = NULL;
+       // Finish the rd_rand engine
+       ENGINE_finish(rdrand_engine);
+       ENGINE_free(rdrand_engine);
+       rdrand_engine = NULL;

+       // Recycle locks
+#if OPENSSL_VERSION_NUMBER < 0x10100000L || defined(LIBRESSL_VERSION_NUMBER)
+       if (setLockingCallback)
+       {
+           CRYPTO_set_locking_callback(NULL);
+       }
+#endif
+   }
    // Destroy the one-and-only RNG
    delete rng;

-   // Recycle locks
-#if OPENSSL_VERSION_NUMBER < 0x10100000L || defined(LIBRESSL_VERSION_NUMBER)
-   if (setLockingCallback)
-   {
-       CRYPTO_set_locking_callback(NULL);
-   }
-#endif
    for (unsigned i = 0; i < nlocks; i++)
    {
        MutexFactory::i()->recycleMutex(locks[i]);

minfrin commented 3 years ago

Looking in more detail, there seems to be a pattern of memory management where variables are being manually created with new and manually destroyed with delete inside destructors. It then in turn looks like objects are being shallow copied, and then c++ is trying to auto-destruct the copy in atexit(), which has the effect of attempting to delete the referenced objects a second time. The double free then triggers a crash.

What makes this problem really serious is that on many (all?) Linux distros pk11-kit is auto-wired into NSS, and softhsm is auto-wired into p11-kit.

As a result, the simple act of installing the softhsm driver is enough to crash an application that uses NSS, even when that application that uses NSS makes no attempt to know or care about the softhsm.

rijswijk commented 3 years ago

Can you print a stack trace for the double destructor call you saw earlier on? (so do bt in both cases)

minfrin commented 3 years ago

The two stacktraces are below.

Breakpoint 2, Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
56  }
(gdb) bt
#0  Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
#1  0x00007ffff1b94553 in MutexFactory::recycleMutex (this=0x5555559637d0, mutex=0x5555559b8f00) at MutexFactory.cpp:131
#2  0x00007ffff1bb8957 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:60
#3  0x00007ffff1bb8992 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:61
#4  0x00007ffff1b5a9f9 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:393
#5  0x00007ffff1b5ab82 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
#6  0x00007ffff1b8ad88 in std::default_delete<SoftHSM>::operator() (this=0x7ffff1e17340 <SoftHSM::instance>, __ptr=0x555555799180)
    at /usr/include/c++/8/bits/unique_ptr.h:81
#7  0x00007ffff1b8a16f in std::unique_ptr<SoftHSM, std::default_delete<SoftHSM> >::~unique_ptr (this=0x7ffff1e17340 <SoftHSM::instance>, 
    __in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:269
#8  0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#9  0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#10 0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235
(gdb) cont
Continuing.

Breakpoint 2, Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:50
50  Mutex::~Mutex()
(gdb) bt
#0  Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:50
#1  0x00007ffff1b9428c in Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
#2  0x00007ffff1b94553 in MutexFactory::recycleMutex (this=0x5555559637d0, mutex=0x5555559b8f00) at MutexFactory.cpp:131
#3  0x00007ffff1bb8957 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:60
#4  0x00007ffff1bb8992 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:61
#5  0x00007ffff1b5a9f9 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:393
#6  0x00007ffff1b5ab82 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
#7  0x00007ffff1b8ad88 in std::default_delete<SoftHSM>::operator() (this=0x7ffff1e17340 <SoftHSM::instance>, __ptr=0x555555799180)
    at /usr/include/c++/8/bits/unique_ptr.h:81
#8  0x00007ffff1b8a16f in std::unique_ptr<SoftHSM, std::default_delete<SoftHSM> >::~unique_ptr (this=0x7ffff1e17340 <SoftHSM::instance>, 
    __in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:269
#9  0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#10 0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#11 0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff2064700 in ?? ()
(gdb) bt
#0  0x00007ffff2064700 in ?? ()
#1  0x00007ffff1b9465f in MutexFactory::DestroyMutex (this=0x5555559637d0, mutex=0x5555559b8f20) at MutexFactory.cpp:176
#2  0x00007ffff1b94271 in Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:54
#3  0x00007ffff1b9428c in Mutex::~Mutex (this=0x5555559b8f00, __in_chrg=<optimized out>) at MutexFactory.cpp:56
#4  0x00007ffff1b94553 in MutexFactory::recycleMutex (this=0x5555559637d0, mutex=0x5555559b8f00) at MutexFactory.cpp:131
#5  0x00007ffff1bb8957 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:60
#6  0x00007ffff1bb8992 in HandleManager::~HandleManager (this=0x555555795dc0, __in_chrg=<optimized out>) at HandleManager.cpp:61
#7  0x00007ffff1b5a9f9 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:393
#8  0x00007ffff1b5ab82 in SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
#9  0x00007ffff1b8ad88 in std::default_delete<SoftHSM>::operator() (this=0x7ffff1e17340 <SoftHSM::instance>, __ptr=0x555555799180)
    at /usr/include/c++/8/bits/unique_ptr.h:81
#10 0x00007ffff1b8a16f in std::unique_ptr<SoftHSM, std::default_delete<SoftHSM> >::~unique_ptr (this=0x7ffff1e17340 <SoftHSM::instance>, 
    __in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:269
#11 0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#12 0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#13 0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235
(gdb) quit

Looking closer, it looks like we're stopping twice in the destructor.

Another question to ask - why is the SoftHSM object still alive and wired into atexit? Once C_FInalize is called, SoftHSM should be completely gone.

Placing a breakpoint on the SoftHSM destructor, we see the object is still alive during atexit:

Breakpoint 1, SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
410 }
Missing separate debuginfos, use: yum debuginfo-install cyrus-sasl-lib-2.1.27-5.el8.x86_64 expat-2.2.5-4.el8.x86_64 keyutils-libs-1.5.10-6.el8.x86_64 krb5-libs-1.18.2-5.el8.x86_64 libcom_err-1.45.6-1.el8.x86_64 libffi-3.1-22.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libselinux-2.9-4.el8_3.x86_64 libstdc++-8.3.1-5.1.el8.x86_64 libtasn1-4.13-3.el8.x86_64 libuuid-2.32.1-24.el8.x86_64 libxcrypt-4.1.1-4.el8.x86_64 libxml2-2.9.7-8.el8.x86_64 nspr-4.25.0-2.el8_2.x86_64 nss-3.53.1-17.el8_3.x86_64 nss-softokn-3.53.1-17.el8_3.x86_64 nss-softokn-freebl-3.53.1-17.el8_3.x86_64 nss-util-3.53.1-17.el8_3.x86_64 openldap-2.4.46-15.el8.x86_64 openssl-libs-1.1.1g-15.el8_3.x86_64 p11-kit-0.23.14-5.el8_0.x86_64 p11-kit-trust-0.23.14-5.el8_0.x86_64 pcre2-10.32-2.el8.x86_64 sqlite-libs-3.26.0-11.el8.x86_64 zlib-1.2.11-16.2.el8_3.x86_64
(gdb) bt
#0  SoftHSM::~SoftHSM (this=0x555555799180, __in_chrg=<optimized out>) at SoftHSM.cpp:410
#1  0x00007ffff1b8ad88 in std::default_delete<SoftHSM>::operator() (this=0x7ffff1e17340 <SoftHSM::instance>, __ptr=0x555555799180)
    at /usr/include/c++/8/bits/unique_ptr.h:81
#2  0x00007ffff1b8a16f in std::unique_ptr<SoftHSM, std::default_delete<SoftHSM> >::~unique_ptr (this=0x7ffff1e17340 <SoftHSM::instance>, 
    __in_chrg=<optimized out>) at /usr/include/c++/8/bits/unique_ptr.h:269
#3  0x00007ffff5130f8c in __run_exit_handlers () from /usr/lib64/libc.so.6
#4  0x00007ffff51310c0 in exit () from /usr/lib64/libc.so.6
#5  0x000055555555ffc1 in main (argc=13, argv=0x7fffffffe248) at redwax-tool.c:2235

minfrin commented 3 years ago

More digging.

I've discovered that in this case NSS was calling C_Initialize, but was not calling C_Finalize.

The reason was the presence of the NSS_INIT_COOPERATE flag on it, which sets the NSS_INIT_NOPK11FINALIZE flag, which tells NSS not to run C_Finalize on anything it called C_Initialize on.

Removing the NSS_INIT_COOPERATE flag when initialising avoids the crash.

It looks like the destructors in softhsm shut down cleanly when C_Finalize is called, but if not called, the finaliser in the SoftHSM object causes softhsm to crash.

blackbird1 commented 1 year ago

I use the OpenSSL 3 PKCS11 provider and everything works great until the end of the process. I get SIGSEGV where it is trying to destruct. I'm currently using SoftHSM2 with Botan because it doesn't work well with openssl 3. Any updates on this issue? I was able to use lldb and trace it down to calling destructors on SoftHSM instance in C++. Maybe they are getting called more than one time?

opendnssec / SoftHSMv2

SIGSEGV: NSS + SoftHSMv2 == crash during atexit() #635