parallaxsecond / parsec

Platform AbstRaction for SECurity service
https://parsec.community/
Apache License 2.0
462 stars 67 forks source link

PKCS11 provider connectivity issue #607

Open ionut-arm opened 2 years ago

ionut-arm commented 2 years ago

The issue

Disconnecting and reconnecting a pluggable PKCS11 token leads to the PKCS11 provider being inaccessible. To reproduce the issue:

Solution

There are still bits of information missing which will require some more investigation. I'm hoping to get a way to reproduce this using SoftHSM2.

The ideal solution would be for us to simply re-establish a functional connection to the hardware token when we detect that the token has been unplugged and plugged back in. The actual solution will depend on how reliably we can tell whether this has happened and on what options we identify for re-establishing that connection in a clean way.

Outstanding questions

This is a variant of the more generic approach discussed in #607

anta5010 commented 2 years ago

With RUST_LOG=trace after re-inserting a USB HSM module:

parsec-tool:

# RUST_LOG=trace parsec-tool -p 2 create-rsa-key -k anta-11-new
[DEBUG] Parsec BasicClient created with implicit provider "Mbed Crypto provider" and authentication data "UnixPeerCredentials"
[INFO ] Creating RSA encryption key...
[DEBUG] Running getuid
[ERROR] Subcommand failed: there was a communication failure inside the implementation (ParsecClientError(Service(PsaErrorCommunicationFailure)))

parsec:

[TRACE parsec_service::front::front_end] handle_request ingress
[INFO  parsec_service::front::front_end] New request received from application name "0"
[TRACE parsec_service::back::dispatcher] dispatch_request ingress
[TRACE parsec_service::back::backend_handler] execute_request ingress
[TRACE parsec_service::providers::pkcs11] psa_generate_key ingress
[ERROR parsec_service::providers::pkcs11::utils] Error converted to PsaErrorCommunicationFailure; Error: Some horrible, unrecoverable error has occurred.  In the worst case, it is possible that the function only partially succeeded, and that the computer and/or token is in an inconsistent state.
[TRACE parsec_service::back::dispatcher] execute_request egress
[TRACE parsec_service::front::front_end] dispatch_request egress
[INFO  parsec_service::front::front_end] Response for application name "0" sent back
[TRACE parsec] handle_request egress
ionut-arm commented 2 years ago

From the spec:

5.1.1 Universal Cryptoki function return values

Any Cryptoki function can return any of the following values:

· CKR_GENERAL_ERROR: Some horrible, unrecoverable error has occurred. In the worst case, it is possible that the function only partially succeeded, and that the computer and/or token is in an inconsistent state.

· CKR_HOST_MEMORY: The computer that the Cryptoki library is running on has insufficient memory to perform the requested function.

· CKR_FUNCTION_FAILED: The requested function could not be performed, but detailed information about why not is not available in this error return. If the failed function uses a session, it is possible that the CK_SESSION_INFO structure that can be obtained by calling C_GetSessionInfo will hold useful information about what happened in its ulDeviceError field. In any event, although the function call failed, the situation is not necessarily totally hopeless, as it is likely to be when CKR_GENERAL_ERROR is returned. Depending on what the root cause of the error actually was, it is possible that an attempt to make the exact same function call again would succeed.

· CKR_OK: The function executed successfully. Technically, CKR_OK is not quite a “universal” return value; in particular, the legacy functions C_GetFunctionStatus and C_CancelFunction (see Section 5.15) cannot return CKR_OK.

The relative priorities of these errors are in the order listed above, e.g., if either of CKR_GENERAL_ERROR or CKR_HOST_MEMORY would be an appropriate error return, then CKR_GENERAL_ERROR should be returned.

I think it's fair to say that if we get CKR_GENERAL_ERROR we should try to reset the connection, no matter the cause. Or bomb out (if we've already tried to reset once).