sfackler / rust-native-tls

Apache License 2.0
471 stars 195 forks source link

[schannel] Identity::from_pkcs12 leaves private keys on disk and never cleans them up #199

Closed arsing closed 3 years ago

arsing commented 3 years ago

cc @steffengy

At $dayjob we have an issue with a Windows program that uses native-tls with a client identity that creates many files under C:\ProgramData\Microsoft\Crypto\RSA\$sid\. Every attempt at a TLS connection creates a new file, and they are never cleaned up.

We traced it to the expected behavior of PFXImportCertStore as it's used by native-tls. The schannel impl of Identity::from_pkcs12 does not call .no_persist_key(true) on the PfxImportOptions. (Note that the schannel crate cannot default this to true because it has to deal with keys that might be loaded from a CSP / machine store, but since the native-tls crate only deals with in-memory keys it knows it's dealing with ephemeral keys.)

Now, I tried adding a .no_persist_key(true) call there, but it causes the tests to fail, because AcquirePrivateKeyOptions::acquire() returns Err(CRYPT_E_NO_KEY_PROPERTY). This is a CI run where I eprintln!()d the error from acquire().

It might be the case that using non-persisted keys also requires using the CRYPT_ACQUIRE_CACHE_FLAG flag in the call to CryptAcquireCertificatePrivateKey, however this requires changing the schannel crate to respect the semantics of that flag. Namely that the function might return free == winapi::FALSE, in which case the CryptProv / NcryptKeyHandle must hold on to a clone of the CertContext and their Drop should not call CryptReleaseContext / NCryptFreeObject. I made this change in https://github.com/arsing/schannel-rs/commit/5853d3604384a6d21caa6c5320ac665f77e693b5 , and the tests that previously failed with CRYPT_E_NO_KEY_PROPERTY now pass, but the tests that use the cert as a TLS server cert still fail with SEC_E_NO_CREDENTIALS; this is a CI run with that failure.

Any ideas?

sfackler commented 3 years ago

It has been a very long time since I've touched the schannel stuff unfortunately. You may want to check what curl's schannel backend does here, though I can't remember if it supports PKCS#12 identities.

steffengy commented 3 years ago

For more context (whole issue seems insightful): https://github.com/dotnet/runtime/issues/23749#issuecomment-519626757

From quickly reading through it, apparently its a bug with schannel and is "workaroundable" by doing export/import.

arsing commented 3 years ago

Yes, looks like it.

The other thing we discussed internally is the option of letting the key be persisted for the duration of the TLS connection, then cleaning it up "afterwards". For CryptProv keys this means calling CryptAcquireContext(CRYPT_DELETEKEYSET); for NCryptKeyHandle I believe this means calling NCryptDeleteKey.

"Afterwards" presumably equates to "when the CertContext is dropped". I did try implementing this, but I realized since CertContext can be cloned it might need some refcounting to ensure that that only the last CertContext deletes the key. I haven't implemented that yet, but does that approach make sense? If so, I can take a stab at implementing it after the weekend.

steffengy commented 3 years ago

I think so, yeah. Alternatively the workaround might be worth looking into as well?

arsing commented 3 years ago

Update:

The other thing we discussed internally is the option of letting the key be persisted for the duration of the TLS connection, then cleaning it up "afterwards". For CryptProv keys this means calling CryptAcquireContext(CRYPT_DELETEKEYSET); for NCryptKeyHandle I believe this means calling NCryptDeleteKey.

"Afterwards" presumably equates to "when the CertContext is dropped". I did try implementing this, but I realized since CertContext can be cloned it might need some refcounting to ensure that that only the last CertContext deletes the key. I haven't implemented that yet, but does that approach make sense? If so, I can take a stab at implementing it after the weekend.

This doesn't help. The key remains on disk.

I only tested with ncrypt keys (by changing the load-private-key function to use CRYPT_ACQUIRE_PREFER_NCRYPT_KEY_FLAG instead of CRYPT_ACQUIRE_ALLOW_NCRYPT_KEY_FLAG) because it's more complicated to delete HCRYPTPROV keys, but it probably won't work for those either.


Alternatively the workaround might be worth looking into as well?

I'll try this now. I remembered I wrote code a long time ago to try to solve this problem; I'll check if it actually does.

arsing commented 3 years ago

I'll try this now. I remembered I wrote code a long time ago to try to solve this problem; I'll check if it actually does.

Okay, so that code does work. Here's a hacky commit to the tests to make them pass on Windows while not writing any files to disk; specifically the get_identity fn.

It requires starting with a PEM / DER of the cert and privkey separately, as opposed to starting with a PKCS#12 blob. It so happens that $dayjob's code does start with a separate cert and privkey. So we can use that code instead of what we do today (openssl::pkcs12::Pkcs12Builder) to create the PKCS#12 blob for native_tls::Identity::from_pkcs12. No changes to schannel or native-tls are required.

So, do you see anything there that you want upstreamed into this crate or into schannel? Like, do you want to add Identity::from_der(cert: &[u8], key: &[u8]) or something? If not, you can close this issue.

sfackler commented 3 years ago

Supporting separate key + certificate files has been a long running feature request, so pulling that in will be a big win even ignoring the disk write stuff!

arsing commented 3 years ago

I confirmed the fix in $dayjob's code. In case anyone else has this same issue, something like this will help. Again, as I said above, I don't know if there's a way to do this when starting with a PKCS#12 blob, or if it can only be done when starting with separate cert and key so that the key can be directly NCryptImportKey'd.