pyca / cryptography

cryptography is a package designed to expose cryptographic primitives and recipes to Python developers.
https://cryptography.io
Other
6.68k stars 1.53k forks source link

Symmetric Key References (aka another piece of HSM support) #1506

Closed reaperhulk closed 7 years ago

reaperhulk commented 9 years ago

The current cryptography APIs for symmetric/MAC/KDF/OTP assume that the key material is loaded as a Python byte string. This is fine for many purposes, but does not handle cases where the key must remain in an HSM or the key material ideally is left in an opaque C pointer and not copied. This proposal attempts to address this by introducing a new proxy object (KeyReference) and demonstrating how it would compare to the current API.

Current API

Cipher

key = b"\x00" * 32
cipher = Cipher(AES(key), CBC(iv), backend)
encryptor = cipher.encrypt()
ciphertext = encryptor.update(b"somedata") + encryptor.finalize()

MAC

hmac = HMAC(key, hashes.SHA256(), backend)
hmac.update(b"somedata")
digest = hmac.finalize()

KDF

hkdf = HKDF(hashes.SHA256(), length, salt, info, backend)
data = hkdf.derive(key)

OTP

hotp = HOTP(key, length, algorithm, backend)
otp = hotp.generate(counter)

Fernet

key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"my deep dark secret")

Proposed API

Interfaces

@six.add_metaclass(abc.ABCMeta)
class KeyReference(object):
    @abc.abstractproperty
    def key_size(self):
        """Returns key size in bytes."""

# This interface retains the ability to get the value of the key for standard
# in-memory keys or potentially HSM keys not marked unextractable. This would 
# be useful for keys generated via `Fernet.generate_key()` which may need to be 
# written somewhere.
@six.add_metaclass(abc.ABCMeta)
class KeyReferenceWithSerialization(KeyReference):
    @abc.abstractmethod
    def dump_bytes(self):
        """
        Outputs a Python bytestring with the value of the key reference.
        """

Loading Keys

# Load key reference from "profile" (for PKCS11 this is a set of attributes).
# Since each backend may have its own view of what's required to load a key
# profile may need to be defined on a per backend basis...?
key = load_key_with_profile(profile, backend)
# Load key reference from bytes
bytes = b"\x00" * 32
key = load_key_with_bytes(bytes, backend)

Cipher

Since the key object (an instance of the KeyReference interface provided by the backend) knows the backend we no longer need to provide a backend in calls to Cipher(key, mode)

cipher = Cipher(AES(key), CBC(iv))
encryptor = cipher.encrypt()
encryptor = cipher.encrypt()
ciphertext = encryptor.update(b"somedata") + encryptor.finalize()

MAC

The same change is made for MAC (HMAC/CMAC).

hmac = HMAC(key, hashes.SHA256())
digest = hmac.finalize()

OTP

The same change also works for OTP (HOTP/TOTP).

hotp = HOTP(key, length, algorithm)
otp = hotp.generate(counter)

Fernet

No changes, but writing the Fernet key (if required) will require using WithSerialization.

key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"my deep dark secret")

KDF

The disadvantage for KDF is that the current API can inform the user if a hash is not supported by the backend when calling the constructor while the new one can't provide that information until derive is called. We could continue to require passing backend but then we'd need to test for mismatches between the backend passed and the backend that created the key reference.

hkdf = HKDF(hashes.SHA256(), length, salt, info)
data = hkdf.derive(key)
reaperhulk commented 9 years ago

It may be worth noting that we could at least partially address the memory wiping security limitation we discuss in our docs (https://cryptography.io/en/latest/limitations/) with this (in addition to allowing us to support a PKCS#11/KMIP backend). A memory-based KeyReference could look something like:

@utils.register_interface(KeyReference)
_KeyReference(object):
    def __init__(self, path_to_key, backend):
        self._key = backend.read_into_char_buffer(path_to_key)

read_into_char_buffer could allocate a char [] and use ffi.gc to memset(0) it before deallocation. dump_bytes would, of course, defeat this, but we could define write_bytes if we wanted to prevent the key from touching Python directly.

reaperhulk commented 9 years ago

I have revised the proposal to add key_size as a property to KeyReference. HSMs can typically provide this data and it allows us to keep our key checks in our algorithm classes. They currently look like this:

def _verify_key_size(algorithm, key):
    # Verify that the key size matches the expected key size
    if len(key) * 8 not in algorithm.key_sizes:
        raise ValueError("Invalid key size ({0}) for {1}.".format(
            len(key) * 8, algorithm.name
        ))
    return key

@utils.register_interface(interfaces.BlockCipherAlgorithm)
@utils.register_interface(interfaces.CipherAlgorithm)
class AES(object):
    name = "AES"
    block_size = 128
    key_sizes = frozenset([128, 192, 256])

    def __init__(self, key):
        self.key = _verify_key_size(self, key)
public commented 9 years ago

I think continuing to accept bytes for keys alongside opaque KeyReferences is fine. KeyReferenceWithSerialization seems particularly weird to me. Are there any non-bytes types you are anticipating for that?

reaperhulk commented 9 years ago

Those who have expressed a preference thus far have been in favor of supporting both bytes and KeyReference objects in perpetuity. This is (presumably) due to concerns about the increased complexity for users who would be required to build a key object rather than just have some bytes.

If we choose to keep both we'll have backends (PKCS11/KMIP) that will need to raise exceptions when you attempt to use bytes keys and we'll probably need to add backend methods to interrogate a backend about that for our tests.

alex commented 9 years ago

To chime in: I would favor supporting just passing bytes indefinitely, I think BytesReferences or whatever are way more complexity, particularly compared with other things.

One alternative that avoids this problem is to have the reference occur at the CipherAlgorithm level: Cipher(AESReference("profile"), CBC("iv"), default_abckend())

adiroiban commented 9 years ago

I think that passing bytes together with a backend is fine.

With API a Cipher or MAC once instanciated is associated with a backend/HMC and will only work with that backed.

The KeyReference will allow instantiating a single Cipher / MAC and use it with multiple backends/HMC.. but I am not sure how common is this scenario.

I assume that once you have a HMC you will want use it for all operations.... and I assume that most HMC will reference keys using a simple string.

Is there a set of targeted HSMs which should be supported by the new API?

reaperhulk commented 7 years ago

Closing as the path to this is going to be very different since we're dropping backends. HSM support would be through implementing our interfaces in another package.