patcg-individual-drafts / ipa

Interoperable Private Attribution (IPA) - A Private Measurement Proposal
Other
36 stars 17 forks source link

Stable DP Sharding #49

Open benjaminsavage opened 1 year ago

benjaminsavage commented 1 year ago

Issue #35 outlines a feasible method of sharding with a very conservative epsilon.

But there was an open question on that post:

Sharding messes up the presort on timestamps

One optimization we’ve been hoping to use is to have the report collector presort on timestamps in the clear before submitting a query. However, in the computation of the shard id with the PRF we must shuffle all the rows. This breaks the previous sort on timestamps. It is an open problem to figure out if there is a way to maintain this presorting optimization in some way.

I’d like to present a possible solution to this problem: “Stable DP Sharding”. It’s a variation on the multi-bit radix sort algorithm.

Step 1: Modulus conversion

Input (streaming):

The report collector submits reports containing an XOR replicated sharing of a 45-bit match key. So each helper receives two 45-bit integers, one of which it shares in common with the helper to its left, and one of which it shares in common with the helper to its right.

$[mk_i]$

We assume that the report collector is submitting a query of 100 billion records, and it’s impossible for all the records to fit in memory on a single machine. As such, the helpers must operate on streaming records as they arrive.

The modulus conversion protocol is an MPC protocol to “upgrade” these bitwise sharings in $F_2$ to a vector of sharings in $F_p$, for some large value of $p$.

${ [mk^0_i], [mk^1_i], …, [mk^{45}_i] }$

We had originally proposed that the helpers would use a prime sufficiently large to produce our intended security parameter. We are currently using the prime $2^{32} - 5$.

However, certain PRFs we might want to use in Step 1 require larger fields. In this case, we might consider either using a very large $p$, or potentially perform the modulus conversion twice, once to produce the inputs used to generate the PRF, and once for use in the rest of the protocol (i.e. sorting, and computing of helper bits)

Step 2: Compute the PRF

Input (streaming):

For each input row i, each helper, receives 45 secret-shares, which are sharings of the bits of the match key. ${ [mk^0_i], [mk^1_i], …, [mk^{45}_i] }$

Each secret value $mk^j_i ∈ {0, 1}$ But the secret-sharings are in a large field: $[mk^j_i] ∈ Z_p$

The helpers perform an MPC protocol to compute a secret-sharing of a PRF of these values: $[PRF(mk)_i]$

Potential PRFs we might consider:

Step 3: Bit-decompose the PRF

Input (streaming):

For each input row, at the end of step 1, each helper should be in possession of a secret-share of a PRF of the match key: $[PRF(mk)_i] ∈ Z_p$

Now the helpers must run a bit-decomposition protocol to extract the M least-significant bits of this value: ${ [PRF(mk)^0_i], [PRF(mk)^1_i], .., [PRF(mk)^M_i] }$

We will utilize these M bits in much the same way as the multi-bit sort algorithm. At each round of sharding, the input rows will be sharded into $2^M$ different shards. It’s very likely that M=3 will continue to be the optimal value, but it’s possible that we will want to choose a larger value to minimize the number of malicious shuffle operations, which will be non-trivial for extremely large input sizes.

There will likely be multiple rounds, and there is no requirement that M be the same in each round.

Step 4: Compute equality checks

Input (streaming):

For each input row, the helpers consider their secret-shares of M bits of the bit-decomposed PRF:

${ [PRF(mk)^0_i], [PRF(mk)^1_i], .., [PRF(mk)^M_i] }$

Consider this a bit-decomposition of an unsigned integer, meaning it must represent one of the numbers in the range: $[0, 2^M)$

The helpers run a protocol to compare these M secret-shared bits with all of these possible values. The results is a vector: ${ [eq(0)_i], [eq(1)_i], …, [eq(2^M - 1)_i] }$

Where all of the elements are sharings of zero except one, which is a sharing of one.

All of these checks can be done with just 4 multiplications when M=3 and 11 multiplications when M=4. (See the implementation of this in: multi_bit_permutation.rs)

Step 5: Increment running sums

Input (streaming):

For each input row, the helpers take their shares of the $2^M$ equality checks computed in step 4: ${ [eq(0)_i], [eq(1)_i], …, [eq(2^M - 1)_i] }$

Before processing input row 0, the helpers first initialized a vector of secret-shares of zero: ${ [count(0)], [count(1)], …, [count(2^M - 1)] }$

Upon processing each input row, the helpers update these counts by adding the equality check shares computed in the last step:

$[count(0)] = [count(0)] + [eq(0)_i]$ $[count(1)] = [count(1)] + [eq(1)_i]$ … $[count[2^M - 1)] = [count[2^M - 1)] + eq(2^M - 1)_i]$

This only requires local processing and no inter-helper communication.

Since only one equality check is a sharing of one, only one counter will be updated. The others will remain the same.

Step 6: compute the permutation destination index

Input (streaming):

For each input row, the helpers take both:

  1. The vector of shares we called “equality checks” computed in step 4: ${ [eq(0)_i], [eq(1)_i], …, [eq(2^M - 1)_i] }$
  2. The current value of the running sums updated in step 5: ${ [count(0)], [count(1)], …, [count(2^M - 1)] }$

The helpers run a protocol to compute the dot-product of these two vectors. The result is a secret-sharing of a number: $[σ(i)]$

This number will be a part of a per-shard permutation. Its value will eventually be revealed (after shuffling). The number will indicate the destination index (on that shard) to which this record should be taken.

A few notes about this value:

Step 7: malicious shuffle (part 1): re-distribute

Input (streaming):

The original input row, and in addition to that a few values which were computed in the past few steps:

  1. The M least significant bits of the PRF: ${ [PRF(mk)^0_i], [PRF(mk)^1_i], .., [PRF(mk)^M_i] }$
  2. The per-shard destination index: $[σ(i)]$

We assume that we must process the rows in a streaming fashion, because there are too many to fit into memory on a single machine.

We assume that the helper party has access to Q physical machines, such that if there are N input rows, then N/Q rows can comfortably fit into memory.

Two helpers will use their shared randomness to generate a sort of “permutation”, but this will be done slightly differently due to the streaming nature of this shuffle.

For each row, these two helpers will generate a pseudo-random value in the range [0, Q) using shared randomness, and send the record to the corresponding physical machine.

Step 8: malicious shuffle (part 2): add dummy records

Input (none):

The same two helpers from the last step will now generate a random number of “dummy records” per “shard”.

There will be $2^M$ “shards”. These will (at a later step) be revealed by running a “reveal” protocol on the M bits of the PRF which were used in step 4.

For each of the values in the range $[0, 2^M)$, the helpers can generate a random number of “dummy records”. They can select the values of the M bits of the PRF to indicate the desired shard (they need not compute any actual PRF).

For example, when M=3, then when generating “dummy records” for shard 6, the two helpers should ensure that: $[PRF(mk)^0_i] = [1]$ $[PRF(mk)^1_i] = [1]$ $[PRF(mk)^2_i] = [0]$

They must also take care to generate legitimate destination indices. They can do this by continuing to increment the corresponding counters.

For example, when generating a “dummy record” for shard 6, the two helpers should first increment counter 6: $[count(6)] = [count(6)] + 1$

Then, they can use this as the secret-sharing of the destination index on shard 6 as: $[σ(i)] = [count(6)]$

Each dummy record is then sent to a random physical machine $[0, Q]$, determined by the shared randomness of the two helpers.

Step 9: malicious shuffle (part 3): distributed shuffle

Each of the Q physical machines operated by the two helpers nodes now randomly permute all of the rows they were sent using Fisher-Yates.

Step 10: malicious shuffle (part 3): bringing it all back together

Starting from physical machine 0, and ending with physical machine Q-1, the shuffled records on each machine are then sent back to the coordinator machines that originally processed the streaming inputs.

This is where the “reshare” process happens. The two helpers that know the “permutation” need some communication between themselves as they reshare the values based on PRSS.

The helper who was “out of the loop” so to speak, is simply informed of the total number of records it must generate, and it initializes that many rows, where each sharing is just initialized with PRSS.

Step 11: repeat the shuffle 2 more times

As the records stream in during step 10, the next shuffle immediately begins. A different set of two helpers participates each time. They immediately repeat step 7, randomly sending each record to one of Q physical machines.

The helpers repeat steps 7-10 a total of three times, each time with a different helper "out of the loop".

Step 12: Reveal the shards and permutation destinations

As soon as the 3rd shuffle is complete, and the “bringing it all back together” process is happening, the 3 helpers can immediately (in a streaming fashion) reveal two things about each record they process:

  1. The M bits of the PRF which were used in step 4 ${ [PRF(mk)^0_i], [PRF(mk)^1_i], .., [PRF(mk)^M_i] }$
  2. The per-shard permutation destination $[σ(i)]$

The three helpers run a “reveal” protocol to open these secret-sharings.

The M bits of the PRF are considered as an M-bit unsigned integer: shard_id = $PRF(mk)^0_i + 2 PRF(mk)^1_i + … + 2^{M-1}PRF(mk)^M_i$

This row can immediately be sent, in a streaming fashion, to the machine intended to process that shard_id.

Step 13: Apply the permutation

Each helper can now use $2^M$ physical machines (shards) to process the inputs. All of the records about a given user will be present on just one shard. And due to the dummy rows, we have a differential privacy guarantee about the amount of information released via the cardinality of rows on each shard.

The machine processing each shard can simply move each record to the destination index given by $σ(i)$. No information is revealed in this process because it’s just a random permutation (per shard).

This ensures that all of the real records that were sent to each shard appear in the same order (relative to one another) as they did in the original input. In this way, we can avoid the need to sort by timestamp. This is why this is called a “stable sharding”.

Yes, a malicious helper knows that the dummy records are all at the end of the shard, but it doesn’t know how many of them are dummy records, and it doesn’t know which real records made it to this shard. Due to the usage of a PRF, a malicious match key provider cannot even ensure two different match keys will end up on the same shard.

marianapr commented 1 year ago

I think it will be nice to discuss this in the PATCG meetings for IPA. I am not sure I was able to follow the details here, in particular how the permutations per shard are generated and how they are preserving the initial order if we are trying to keep the records sorted by timestamp.

danielmasny commented 1 year ago

I think it will be nice to discuss this in the PATCG meetings for IPA. I am not sure I was able to follow the details here, in particular how the permutations per shard are generated and how they are preserving the initial order if we are trying to keep the records sorted by timestamp.

The approach works similar to the radix sort algorithm. Each event receives a secret shared integer for each shard which is initialized with 1 iff the event is assigned to this shard number and otherwise 0. Then you compute the prefix sum over those integers which assigns each event of a specific shard a unique position within the shard. After assigning an element to a shard, you can delete all non-relevant secret shared integers and reveal the integer of that shard in the clear to rearrange the events in the correct order.