hypothesis / h

Annotate with anyone, anywhere.
https://hypothes.is/
BSD 2-Clause "Simplified" License
2.96k stars 427 forks source link

Attempt to remove the `hkdf` dependency #6570

Closed jon-betts closed 3 years ago

jon-betts commented 3 years ago

For: https://github.com/hypothesis/product-backlog/issues/1199

To support the upgrade effort we are attempting to reduce our dependencies particularly where they duplicate other modules or hold back upgrades.hkdf only lists Python 3.4 support and looks like the sort of functionality that may be available in other crypto libraries if we can work out what it's doing for us.

robertknight commented 3 years ago

The context for why hkdf is used comes from https://github.com/hypothesis/h/commit/51cd49f20fb3d17c2e7f81c028e17c1af274b1ae. The implementation is pretty simple and only depends on a small amount of standard library functionality. It's unlikely to break in any Python 3.x version. You certainly could replace it with cryptography (or some other well known crypto library), but avoiding a headache with native dependencies is the reason it was introduced in the first place.

jon-betts commented 3 years ago

One possible alternative implementation here:

Looks like it's in cryptography too, but in the hazmat layer, which doesn't sound good:

jon-betts commented 3 years ago

Implementations which appear to be drop ins for our function:

from Cryptodome.Protocol.KDF import HKDF
from Cryptodome.Hash import SHA512

def derive_key_pycryptodome(key_material, salt, info):
    if not isinstance(key_material, bytes):
        key_material = key_material.encode()

    return HKDF(master=key_material, key_len=64, salt=salt, hashmod=SHA512,
               num_keys=1, context=info)

from cryptography.hazmat.primitives.kdf.hkdf import HKDF as HKDF2
from cryptography.hazmat.primitives import hashes

def derive_key_cryptography(key_material, salt, info):
    hkdf = HKDF2(
        algorithm=hashes.SHA512(),
        length=64,
        salt=salt,
        info=info
    )

    if not isinstance(key_material, bytes):
        key_material = key_material.encode()

    return hkdf.derive(key_material)

Pros and cons: