world-federation-of-advertisers / common-jvm

Implementation of common JVM dependencies for the Halo-CMM repositories.
Apache License 2.0
3 stars 3 forks source link

Reconsider ID Computation in Tink #271

Open tholenst opened 1 month ago

tholenst commented 1 month ago

The code in src/main/kotlin/org/wfanet/measurement/common/crypto/tink/TinkKeyId.kt is currently used to compute a 64 bit ID of a key in a keyset. It does so by calculating farmHashFingerprint64() on primaryKey.keyData.value.

As far as I can tell this happens only for HPKE and ECIES public keys (because thees are the only templates used in src/main/kotlin/org/wfanet/measurement/common/crypto/tink/KeyHandle.kt).

Unfortunately, doing this is problematic, as the keyData.value can change between releases of Tink. Both ECIES and HPKE use BigIntegers in the representation, and in the past C++ Tink and Java Tink behaved differently in how they are encoded (in particular, the amount of 0 padding in the beginning). Because of this, we changed the behavior of Tink Java when you serialize these keys. This changed in release 1.11.0 for ECIES, and I think 1.8.0 earlier (let me know if you need to know this for sure). Currently, KeysetHandle stores this internally, but we plan to change this in the future so that we do not have to store the serialization at each point.

This means that the hash can change if you simply write out the Keyset another time, or with upcoming refactorings in Tink.

In principle, with the somewhat newer versions of Tink (since 1.11), code like this could be used to get the Hash of a key as you require:

KeysetHandle privateHandle = getHandleFromSomewhere();
EciesPublicKey publicKey = (EciesPublicKey) handle.getPrimary().getKey().getPublicKey();
Hashing.farmHashFingerprint64().hashBytes(publicKey.getNistCurvePoint().getAffineX().toByteArray(), ...);

Such a sequence would have the advantage that the hash is guaranteed to stay stable. However, it's somewhat more work since you have to ensure yourself that you don't forget to make the Hash sensitive of any of the data which is relevant.

At one point in the future, Tink might also declare that the serialization stays stable, but we are not at this point yet.

I could write code which attempts to ensure that you stay close to the previous hash, but because the actual value probably depends on the Tink version used to create the keys it would not be guaranteed.

I think overall there are the following options: 1) Keep the code as is. This risks that the hash changes in the future. 2) Rewrite the way the hashing is done. This is more work and once you are done you have your own hashing so you have full control and it will be stable.

SanjayVas commented 1 month ago

@tholenst We know that protobuf serialization is also non-deterministic. The reason we assumed our usage is safe is that each key is written to storage and considered immutable after that point. That is to say that the a key is only serialized/written once, but may be deserialized/read in the future. Are you saying that this part is also not stable?

Related to #138.

tholenst commented 1 month ago

The short answer is that it's still problematic, the long answer is a bit more complicated.

Up until now, if you only read keys from disk and then call your mechanism to get the proto, it will be stable.

However, you also call getPublicKeysetHandle which was never guaranteed to be stable as it will internally first read the private key proto, get out the public key proto, and then serialize the public key proto again. For example, this was relevant several years ago -- so you see that toByteString is called there on the public key.

One can argue that in practice the proto serialization is deterministic. But now we have a problem for you exactly because the Tink team is trying to solve the underlying issue you had in the past here: no access at all to the key material.

To solve this issue, we introduced objects which provide the key material. However, now it's weird that the KeysetHandle stores the old protos still while we have objects which would be better to store. So we would like to only store the objects. But once we do that, we will lose all information about how much the BigIntegers were padded.