[protocol spec] Resolve TODOs in GroupHash/Unlinkability Analysis

(Copying this comment from the other ticket where I was doing my audit).

Checking the "TODO: check whether this is justified" in the unlinkability proof for DiversifyHash:

Let (P, pk) be a diversified address with secret key x, i.e. pk = [x]P. Generating a new diversified address for the same secret key but a random distinct diversifier gives (Q, qk) where qk = [x]Q and, making the random oracle assumption about GroupHash, Q=[y]P for some random y.

Looking at ElGamal, we can suppose Alice chose (P, pk) as her public key. Now, when Bob encrypts a message corresponding to O_J, he chooses a random y and outputs [y]P and [y][x]P = [yx]P = [x][y]P. The distribution of Bob's output, [x][y]P and [y]P for a random y, is the same as the distribution of generating a new diversified address, qk = [x][y]P and Q=[y]P for a random y. So that part is good.

Let's see if I can make the reduction to ElGamal key privacy explicit. I'm not confident this is correct, and see the caveat below.

Consider the following experiments, for adversaries A and bit b.

Experiment Exp_A^un-b:

(P₁, pk₁, psk₁) <- NewDiversifiedAddress()
(P₂, pk₂, psk₂) <- NewDiversifiedAddress()
(Q, qk) <- Diversify(P_b, psk_b).
Return A(P₁, pk₁, P₁, pk₂, Q, qk).

Experiment Exp_A^ik-tpa-b:

(P₁, pk₁, psk₁) <- GenerateElGamal()
(P₂, pk₂, psk₂) <- GenerateElGamal()
(Q, qk) <- ElGamalEncryptTrivialMessage(pk_b).
Return A(P₁, pk₁, P₁, pk₂, Q, qk).

...where....

NewDiversifiedAddress is the process by which an independent new diversified address is generated.
GenerateElGamal generates an ElGamal public and secret key, where the base point is selected the same way as in NewDiversifiedAddress (*).
Diversify generates a different diversified address using the same secret key.
ElGamalEncryptTrivialMessage encrypts the message corresponding to O_J to the public key it's given.

Define IK-TPA ("T" stands for "trivial") and Adv_A^ik-tpa similarly to IK-CPA . Define Unlinkability and Adv_A^un similarly to IK-CPA as well.

Obviously IK-CPA (instantiated with ElGamal) implies IK-TPA so Adv_A^ik-tpa is negligible for all relevant adversaries A. We want to argue that Adv_A^un is negligible for the same set of adversaries. Let A be an adversary from that set and let b be any bit. It's true that Pr[Exp_A^un-b = 1] = Pr[Exp_A^ik-tpa-b = 1], since...

The distribution of (P₁, pk₁, psk₁, P₂, pk₂, psk₂) is the same, since NewDiversifiedAddress() is identical to GenerateElGamal().
The distribution of (Q, qk) the same, under the random oracle assumption, by the argument above.
The distribution over inputs to A is the same, and A is the same, so the distribution over A's outputs will be the same.

So Adv_A^un = Adv_A^ik-tpa. So IK-CPA for ElGamal implies Unlinkability.

However, as noted in the spec, Unlinkability as defined here might not be the right property. For example, if you're given ~2⁴⁴ diversified addresses and are promised that they are all either generated independently or generated by randomly diversifying one single address, you can tell which is the case since you expect collisions in the latter case but not the former. This problem shows up in the definitions above where we required Diversify to make sure it doesn't select the same diversifier by accident. If Diversify just selected randomly, then the distributions over (Q, qk) would be different since Pr[qk = pk_b] = 2^-88 in Exp_A^un-b and much smaller in Exp_A^ik-tpa, so the argument's success would depend on whether 2^-88 still counts as negligible.

The spec recommends generating diversifiers randomly, and in practice people will just generate them randomly and publish them expecting them all not to be linkable, so it might be better to recommend generating no more than ~2²⁰ diversified addresses from the same secret key (to keep the probability of collisions low), and define security by giving the adversary two sets of 2²⁰ diversifications... or something. I'll have to think about this more.

(*) The Wikipedia page for ElGamal doesn't say what distribution Alice samples her generator from, so I'm assuming it doesn't affect the security properties of ElGamal if Alice chooses her generator the same way a diversified base is chosen when generating a new address.

zcash / zips

[protocol spec] Resolve TODOs in GroupHash/Unlinkability Analysis #202