latchset / pkcs11-provider

A pkcs#11 provider for OpenSSL 3.0+
Other
65 stars 39 forks source link

Provider gets into infinite loop on p11prov_CloseSession #239

Closed mouse07410 closed 1 year ago

mouse07410 commented 1 year ago

Describe the bug

OpenSSL-3.1.0 crashes upon completion of operation with

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [59771]

VM Region Info: 0 is not in any region.  Bytes before following region: 4388196352
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      1058e9000-105969000    [  512K] r-x/r-x SM=COW  ...xec/*/openssl

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   ???                                            0x0 ???
1   libp11-kit.0.dylib                     0x105da5575 binding_C_CloseSession + 21
2   libffi.8.dylib                         0x105a69942 ffi_closure_unix64_inner + 545
3   libffi.8.dylib                         0x105a69f4c ffi_closure_unix64 + 72
4   pkcs11.dylib                           0x105b693d5 p11prov_CloseSession + 101
5   pkcs11.dylib                           0x105b693d5 p11prov_CloseSession + 101
6   pkcs11.dylib                           0x105b693d5 p11prov_CloseSession + 101
.  .  .
509 pkcs11.dylib                           0x105b693d5 p11prov_CloseSession + 101
510 pkcs11.dylib                           0x105b693d5 p11prov_CloseSession + 101
511 pkcs11.dylib                           0x105b693d5 p11prov_CloseSession + 101

To Reproduce Steps to reproduce the behavior:

  1. Install OpenSSL-3.1.0, e.g., via sudo port install openssl3
  2. Clone, build, and install this provider
  3. Clone, build and install p11-kit (see below)
  4. Edit /opt/local/etc/openssl/openssl.cnf file to add and configure pkcs11-provider (see below)
  5. Execute command shown below, see the error

Expected behavior No infinite loop, no crash. Actual operation completed correctly before the crash.

Operating environment (please complete the following information):

Token and application used (please complete the following information):

Additional context

Removing other providers, non-essential for this test, had no effect.

Command that exhibits the problem (note that the operation completed correctly, and then OpenSSL crashes):

$ pkcs11-cms-demo
Generate a random data file (30 bytes), Base64-encoded...
openssl rand -engine rdrand -base64 -out /tmp/derive.59761.text 30
Engine "rdrand" set.
uDfpCSLT/MU2WTsuespq9vwlpUTKMwyyJYFvM97+

Get encryption certificate from the token...
pkcs11-tool -r -y cert --id 03 -o /tmp/derive.59761.cert.der
Using slot 0 with a present token (0x0)
subject=CN = Uri the Great

Encrypt file /tmp/derive.59761.text in CMS format using public key on token...

openssl cms -aes256 -encrypt -binary -in /tmp/derive.59761.text -outform PEM -out /tmp/derive.59761.text.cms /tmp/derive.59761.cert.pem

Encrypted CMS file:
-----BEGIN CMS-----
MIIBtgYJKoZIhvcNAQcDoIIBpzCCAaMCAQAxggE+MIIBOgIBADAiMBoxGDAWBgNV
BAMMD0ZvcmVzdCBDQSBSU0EgNAIEXVw67jANBgkqhkiG9w0BAQEFAASCAQBjnEXk
1b7ymy4dSovekzome2uWo/v4pCHttrh478tyARZynq/p9BHQSVWr1vsNcoHHsfO2
UWojdxJaI0FKocETQ42WJ07oyhq1yRJPShn+3qXN4h7wKN18teHnsly9YiFJm/Es
OF7kKbGrCpKh8CGBA0Vj274CH6OfwvU5X9VwWmNowTLoIQpRSf7y4bG0ZaG5iwBG
QVa3VnK7DpBAhKNTmVJ2JemNw+RA35CxUvSmow8yYqgqgplx6VmR+CkSVS8bcleA
BHr12TUjDWWlNglbhS/2A/OpzbUKE5vOyDF8s6+2GZ02asCehpGX2gMZ8OMXOEff
SpjSQ+c7++IWr6TlMFwGCSqGSIb3DQEHATAdBglghkgBZQMEASoEEDxd7DguK76z
P8898zz1zS6AMCQR30G4X4BvtnOH4PwyWtBy+ZM5y3pFn8OO7PiRt7p4Jij/f4yE
zyvuo+Y0tu/04g==
-----END CMS-----

Decrypt CMS message in file /tmp/derive.59761.text.cms...
openssl cms -decrypt -aes256 -binary -inform PEM -in /tmp/derive.59761.text.cms -out /tmp/derive.59761.text.dec -inkey "pkcs11:manufacturer=piv_II;object=KEY%20MAN%20key;object-type=private"
Enter pass phrase for PKCS#11 Token (Slot 17 - Yubico YubiKey OTP+FIDO+CCID):
/Users/ur20980/bin/pkcs11-cms-demo: line 50: 59771 Segmentation fault: 11  ${OPENSSL} cms -decrypt -aes256 -decrypt -binary -inform PEM -in ${TEXTFILE}.cms -out ${TEXTFILE}.dec -inkey "${PRKEY}"

Original and decrypted files match:
uDfpCSLT/MU2WTsuespq9vwlpUTKMwyyJYFvM97+
uDfpCSLT/MU2WTsuespq9vwlpUTKMwyyJYFvM97+
$

OpenSSL config:

[provider_sect]
 default = default_sect
 #legacy = legacy_sect
 pkcs11 = pkcs11_sect
 #gost   = gost_sect
 base = base_sect
 oqs = oqs_sect

[default_sect]
 activate = 1

[legacy_sect]
 activate = 0

[pkcs11_sect]
 module = /opt/local/libexec/openssl3/lib/ossl-modules/pkcs11.dylib
 pkcs11-module-quirks = no-deinit no-allowed-mechanisms
 pkcs11-module-login-behavior = auto
 #pkcs11-module-login-behavior = always
 pkcs11-module-cache-pins = cache
 #pkcs11-module-path = /Library/OpenSC/lib/opensc-pkcs11.so
 #pkcs11-module-path = /usr/local/lib/libykcs11.dylib
 pkcs11-module-path = /opt/local/lib/p11-kit-proxy.dylib
 #pkcs11-module-path = /opt/p11kit/lib/p11-kit-proxy.dylib
 activate = 1

Building p11-kit:

$ git clone https://github.com/p11-glue/p11-kit
.  .  .
$ cd p11-kit
$ meson setup --prefix=/opt/p11kit -Dbuildtype=debug -Dsystemd=disabled -Dbash_completion=disabled -Dclosures=0 _build
.  .  .
$ meson compile -C _build && meson test -C _build
.  .  .
Ok:                 59  
Expected Fail:      0   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0   

Full log written to /Users/ur20980/src/p11-kit/_build/meson-logs/testlog.txt
$ 

@neverpanic @ueno please take a look - when instead of the p11-kit master I use Macports-installed p11-kit binary, MacOS does not register a crash, but several assertions fail:

Decrypt CMS message in file /tmp/derive.60054.text.cms...
openssl cms -decrypt -aes256 -binary -inform PEM -in /tmp/derive.60054.text.cms -out /tmp/derive.60054.text.dec -inkey "pkcs11:manufacturer=piv_II;object=KEY%20MAN%20key;object-type=private"
Enter pass phrase for PKCS#11 Token (Slot 17 - Yubico YubiKey OTP+FIDO+CCID):
p11-kit: 'bound != NULL' not true at fixed3_C_CloseSession
p11-kit: 'bound != NULL' not true at fixed3_C_CloseSession
p11-kit: 'bound != NULL' not true at fixed3_C_Finalize

Original and decrypted files match:
fEYyxMVsYOEqqSS9oeG+DVqMmhz+STfgM/17+H9S
fEYyxMVsYOEqqSS9oeG+DVqMmhz+STfgM/17+H9S
$ 

So, p11-kit definitely does contribute something to this problem.

simo5 commented 1 year ago

Please set pkcs11-module-quirks = no-deinit as described here: https://github.com/latchset/pkcs11-provider/wiki#pkcs11-module-quirks

mouse07410 commented 1 year ago

Please set pkcs11-module-quirks = no-deinit as described here: https://github.com/latchset/pkcs11-provider/wiki#pkcs11-module-quirks

As you can see in https://github.com/latchset/pkcs11-provider/issues/239#issue-1714536771 , it's already done:

pkcs11-module-quirks = no-deinit no-allowed-mechanisms

Or are you saying that I should remove no-allowed-mechanisms from pkcs11-module-quirks?

simo5 commented 1 year ago

No, you can have pkcs11-module-quirks = no-deinit no-allowed-mechanisms if you want.

What do you have now in the config?

simo5 commented 1 year ago

Ah sorry read only the question in your comment, that setting should be fine.

simo5 commented 1 year ago

But C_Finalize is called only if deinit is possible, so either this is not the configuration your system is using, or there is some other issue. Can you provide pkcs11 debug log for the crash?

mouse07410 commented 1 year ago

Can you provide pkcs11 debug log for the crash?

Not sure which log you mean, and how to obtain it. Open to suggestions.

simo5 commented 1 year ago

The usual pkcs11-provider debug log export PKCS11_PROVIDER_DEBUG=file:/tmp/debug.log

Run command that crashes, provide the file.

mouse07410 commented 1 year ago
$ PKCS11_PROVIDER_DEBUG="file:/tmp/pkcs11-debug.txt" openssl cms -decrypt -aes256 -binary -inform PEM -in /tmp/pt.cms -out /tmp/pt.dec -inkey "pkcs11:manufacturer=piv_II;object=KEY%20MAN%20key;object-type=private"
Enter pass phrase for PKCS#11 Token (Slot 17 - Yubico YubiKey OTP+FIDO+CCID):
Segmentation fault: 11
$ 
$ diff /tmp/pt.txt /tmp/pt.dec
$

pkcs11-debug.txt

Crash report:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000
Exception Codes:       0x0000000000000001, 0x0000000000000000

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [3426]

VM Region Info: 0 is not in any region.  Bytes before following region: 4503506944
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      10c6e1000-10c761000    [  512K] r-x/r-x SM=COW  ...xec/*/openssl

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   ???                                            0x0 ???
1   libp11-kit.0.dylib                     0x10cbbd575 binding_C_CloseSession + 21
2   libffi.8.dylib                         0x10c861942 ffi_closure_unix64_inner + 545
3   libffi.8.dylib                         0x10c861f4c ffi_closure_unix64 + 72
4   pkcs11.dylib                           0x10c9813d5 p11prov_CloseSession + 101
5   pkcs11.dylib                           0x10c9813d5 p11prov_CloseSession + 101
6   pkcs11.dylib                           0x10c9813d5 p11prov_CloseSession + 101
.  .  .
509 pkcs11.dylib                           0x10c9813d5 p11prov_CloseSession + 101
510 pkcs11.dylib                           0x10c9813d5 p11prov_CloseSession + 101
511 pkcs11.dylib                           0x10c9813d5 p11prov_CloseSession + 101
mouse07410 commented 1 year ago

@neverpanic @ueno please take a look, as libp11-kit.0.dylib is present at the very last moment, and something appears to corrupt the stack. PKCS11 debug log shows only one call to p11prov_CloseSession(), so it is unclear why the stack has an infinite loop over it (or something inside it).

simo5 commented 1 year ago

It does look like the no-deinit argument is not having effect here, which is strange ... Does it work if you remove the no-allowed-mechanisms ?

mouse07410 commented 1 year ago

It does look like the no-deinit argument is not having effect here, which is strange ...

You can say that again. ;-)

Does it work if you remove the no-allowed-mechanisms ?

Darn!! It does:

$ OPENSSL_DEBUG=3 PKCS11_PROVIDER_DEBUG="file:/tmp/pkcs11-debug.txt" openssl cms -decrypt -aes256 -binary -inform PEM -in /tmp/pt.cms -out /tmp/pt.dec -inkey "pkcs11:manufacturer=piv_II;object=KEY%20MAN%20key;object-type=private"
Enter pass phrase for PKCS#11 Token (Slot 17 - Yubico YubiKey OTP+FIDO+CCID):
$ 
$ diff /tmp/pt.txt /tmp/pt.dec
$ 

So, it works without producing "user-visible" errors or crashes. Result of cryptographic operation is correct. But the pkcs11-debug.txt file has errors (Error: 0x00000005; Can't get module interfaces), which were not present in the prior case: pkcs11-debug.txt

simo5 commented 1 year ago

That "error message is exactly what I was looking for. It is "normal" when no-deinit is in effect, and lack of it on shutdown means no-deinit is non functional.

Are you absolutely sure that you are use only spaces as separators between the arguments of the pkcs11-module-quirks options?

only spaces are allowed, no other separator.

mouse07410 commented 1 year ago

Are you absolutely sure that you are use only spaces as separators between the arguments of the pkcs11-module-quirks options?

You have the excerpt from the openssl.cnf file here https://github.com/latchset/pkcs11-provider/issues/239#issue-1714536771 So, yes - I am absolutely sure. And here's excerpt from the binary dump of the openssl.cnf file:

0030500    y   l   i   b  \n       p   k   c   s   1   1   -   m   o   d
           79  6c  69  62  0a  20  70  6b  63  73  31  31  2d  6d  6f  64
0030520    u   l   e   -   q   u   i   r   k   s       =       n   o   -
           75  6c  65  2d  71  75  69  72  6b  73  20  3d  20  6e  6f  2d
0030540    d   e   i   n   i   t       n   o   -   a   l   l   o   w   e
           64  65  69  6e  69  74  20  6e  6f  2d  61  6c  6c  6f  77  65
0030560    d   -   m   e   c   h   a   n   i   s   m   s  \n       p   k

But speaking of that - perhaps you can relax that parsing? It looks like parser gets confused when more than one argument is present.

simo5 commented 1 year ago

The actual parser is very simple, there isn't much to relax. As long as arguments are space separate strings it should just work. It does ignore any string that doesn't match known ones. I just gdb-ed to make sure this get parsed correctly, and works as expected here.

mouse07410 commented 1 year ago

I just gdb-ed to make sure this get parsed correctly, and works as expected here

So, you had pkcs11-module-quirks = no-deinit no-allowed-mechanisms, and it parsed OK. Right?

One thing that comes to mind - perhaps p11-kit does something undesirable here?

simo5 commented 1 year ago

No p11-kit has nothing to do with pkcs11-provider parsing its own options.

mouse07410 commented 1 year ago

No p11-kit has nothing to do with pkcs11-provider parsing its own options.

Yes I understand - but I wonder if there's some glitch in the passing of parameters or intents between all those different components. Because when I change components, behavior changes.

Update

So, one question is - what prevents no-deinit from being processed correctly when the line pkcs11-module-quirks = includes other arguments.

simo5 commented 1 year ago

I will add more debugging so we can try to figure it out.

simo5 commented 1 year ago

Please try after applying #242 and generate a pkcs11-provide debug file.

You should see something like this in the debug file:

[provider.c:1340] OSSL_provider_init(): Provided config params:
[provider.c:1350] OSSL_provider_init():   pkcs11-module-path: [none]
[provider.c:1350] OSSL_provider_init():   pkcs11-module-init-args: configDir=/home/remote/ssorce/devel/git/pkcs11prov/tests/tmp.softokn/tokens
[provider.c:1350] OSSL_provider_init():   pkcs11-module-token-pin: [****]
[provider.c:1350] OSSL_provider_init():   pkcs11-module-allow-export: [none]
[provider.c:1350] OSSL_provider_init():   pkcs11-module-login-behavior: [none]
[provider.c:1350] OSSL_provider_init():   pkcs11-module-load-behavior: [none]
[provider.c:1350] OSSL_provider_init():   pkcs11-module-cache-pins: [none]
[provider.c:1350] OSSL_provider_init():   pkcs11-module-cache-keys: [none]
[provider.c:1350] OSSL_provider_init():   pkcs11-module-quirks: no-deinit no-allowed-mechanisms
[provider.c:1350] OSSL_provider_init():   pkcs11-module-cache-sessions: [none]
[provider.c:1369] OSSL_provider_init(): PIN available
[provider.c:1383] OSSL_provider_init(): Export allowed
[provider.c:1402] OSSL_provider_init(): Login behavior: auto
[provider.c:1419] OSSL_provider_init(): PINs will not be cached
[provider.c:1436] OSSL_provider_init(): Key caching: in session object
[provider.c:1469] OSSL_provider_init(): Quirks:
[provider.c:1471] OSSL_provider_init():  No finalization on de-initialization
[provider.c:1474] OSSL_provider_init():  No CKA_ALLOWED_MECHANISM use
[provider.c:1496] OSSL_provider_init(): Cache Sessions: 5
[provider.c:1508] OSSL_provider_init(): Load behavior: default
mouse07410 commented 1 year ago
$ OPENSC_DEBUG=3  PKCS11_PROVIDER_DEBUG="file:/tmp/pkcs11-debug.txt" openssl3 cms -decrypt -aes256 -binary -inform PEM -in ~/src/pt.cms -out /tmp/pt.dec -inkey "pkcs11:manufacturer=piv_II;object=KEY%20MAN%20key;object-type=private"
P:7271; T:0x140704718661184 17:24:00.456 [opensc-pkcs11] ctx.c:734:process_config_file: Used configuration file '/Library/OpenSC/etc/opensc.conf'
Enter pass phrase for PKCS#11 Token (Slot 18 - Yubico YubiKey OTP+FIDO+CCID):
Segmentation fault: 11
$

Cryptographic operation (CMS decrypt) completed successfully prior to this crash.

pkcs11-debug.txt

Crash report (weird!):

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [7271]

VM Region Info: 0 is not in any region.  Bytes before following region: 4349673472
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      10342c000-1034c4000    [  608K] r-x/r-x SM=COW  ...SER/*/openssl

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_pthread.dylib             0x7ff81b5833c1 pthread_rwlock_wrlock + 0
1   libcrypto.3.dylib                      0x1044c88ea CRYPTO_THREAD_write_lock + 9
2   libcrypto.3.dylib                      0x10447a9be ENGINE_finish + 30
3   libsofthsm2.so                         0x103ae9218 OSSLCryptoFactory::~OSSLCryptoFactory() + 28
4   libsofthsm2.so                         0x103ae92b4 OSSLCryptoFactory::~OSSLCryptoFactory() + 14
5   libsystem_c.dylib                   0x7ff81b457ba1 __cxa_finalize_ranges + 409
6   libsystem_c.dylib                   0x7ff81b4579bb exit + 35
7   openssl                                0x1034543f5 main + 1445 (openssl.c:322)
8   dyld                                0x7ff81b22c41f start + 1903

opensc-debug.log

simo5 commented 1 year ago

Sounds like the crash moved to softhsm ... I do not think this is an issue with pkcs11-provider, there is some other module calling into openssl routines after a de-initialization begun.

Unfortunately there are libraries that use atexit handlers to wind down contexts, I think it is wrong for a library to unilaterally set up handlers, because a library does not know how an application winds down, and generally for libraries it really is not needed, after all the process is about to exit anyway, and the kernel will free all resources ...

In any case, I am going to close this bug because your latest comment shows that pkcs11-provider is not involved in the crash and all options are parsed correctly.

mouse07410 commented 1 year ago

Following your analysis, I got rid of ~/.config/pkcs11/modules/softhsm2.module (which, unlike YKCS11 and HSM, isn't really needed) - and the crash disappeared.

dengert commented 1 year ago

You may want to report this problem to SoftHSM. The failure may be coming from: https://github.com/opendnssec/SoftHSMv2/blob/develop/src/lib/crypto/OSSLCryptoFactory.cpp#L227-L257 most likely line 257. This has not been changed in years and could use some tests for OpenSSL 3.

It could also be that the MacOS version is different.

dengert commented 1 year ago

Looks like RedHat, Ubuntu, Debian and SoftHSM2 have fixed the problem (or similar problem) http://launchpadlibrarian.net/479947479/softhsm2_2.5.0-1build1_2.6.1-2.diff.gz

https://github.com/opendnssec/SoftHSMv2/commit/c2cc0652b4c4829fc6ba186469f4e324af77dfe8 https://github.com/opendnssec/SoftHSMv2/issues/548 https://bugzilla.redhat.com/show_bug.cgi?id=1831086

What version is running on MacOS?

mouse07410 commented 1 year ago
$ port installed softhsm
The following ports are currently installed:
  softhsm @2.6.1_1 (active)
$
$ rg OPENSSL_init_crypto SoftHSMv2
SoftHSMv2/src/lib/crypto/OSSLCryptoFactory.cpp
173:    OPENSSL_init_crypto(OPENSSL_INIT_ENGINE_ALL_BUILTIN |
241:    ossl_shutdown = !OPENSSL_init_crypto(OPENSSL_INIT_ENGINE_RDRAND, NULL);

SoftHSMv2/m4/acx_openssl_gost.m4
30:                 OPENSSL_init_crypto(OPENSSL_INIT_ENGINE_ALL_BUILTIN | OPENSSL_INIT_LOAD_CONFIG, NULL);

SoftHSMv2/modules/tests/test_openssl_gost.c
16:        OPENSSL_init_crypto(OPENSSL_INIT_ENGINE_ALL_BUILTIN | OPENSSL_INIT_LOAD_CONFIG, NULL);
$ 

I saw their (SoftHSMv2) fix, and am not happy with it. I wish it added OPENSSL_INIT_NO_ATEXIT. But if wishes were wings...