Consider a randomized-response-like privacy mechanism

I want to propose a new privacy mechanism to IPA that:

Serves to help solve the optimization use-case
Does so under the existing differential privacy constraints of IPA as I understand them

This mechanism is inspired by the following research papers:

Deep Learning with Label Differential Privacy (NeurIPS 2021)
Regression with Label Differential Privacy (ICLR 2023)

This research operates in the “label DP” setting, where model features are publicly known, but labels are not. This matches the setting of IPA well, because the report collector has full information about the raw impression event (and thus the corresponding feature vector), but the label (e.g. whether the impression led to a conversion) is protected with differential privacy. While these techniques are in the “local” differential privacy regime, they (somewhat surprisingly) perform close to the state of the art in private model training depending on the task.

Here is an outline of how we could implement one of the algorithms described here, RROnBins in the IPA setting:

For every per-source breakdown key, pass along a list of possible output bins. For example, a bucketized range of values like {[0, 10], [11, 100], [101+]}
After aggregation*, rather than applying a fixed noise distribution like Gaussian to the sum, perform k-ary Randomized Response on the specified k output bins for that breakdown key. That is, with a probability p = k/(k -1 + e^epsilon), pick a bin at random, otherwise pick the correct bin the aggregate falls in. This mechanism satisfies epsilon differential privacy.
Note: the mechanism can also be extended to do randomized response over a restricted set of trigger-side breakdown keys if those become supported by IPA.

*Note: for practical purposes, we would likely need to support a unique breakdown key per source to take advantage of this research, i.e “aggregate” over only single sources. While IPA currently only mentions “aggregate” queries, as far as I can tell there are no restrictions in the existing design to aggregate only over single events (i.e. a unique breakdown key per source), as long as the output is protected by differential privacy. The mechanism described above offers the same protection.

For an overview of how we’re thinking about these capabilities in ARA, see the flexible event-level explainer we recently published for ARA.

Thanks for filing this issue @csharrison - it’s a very interesting proposal!

First, I just want to make sure I correctly understand the proposal details.

Looking at the first paper you linked (NeurIPS 2021), here is how I see this being applied this domain, via a specific example.

Imagine I want to train an ML model to predict the likelihood that an ad click will result in a “purchase” event. In this case, the label domain is simply {1, 0}.

The paper describes the computation of a “prior” for each possible label. IPA offers a straightforward approach to computing this prior! Let’s assume we empirically measure that the average likelihood a click leads to a purchase is 0.5%.

Next, if I understand correctly, the proposal is to support a different form of DP for IPA which essentially makes the output compatible with event-level ARA. That is, each row is not aggregated with other rows, but instead for an input of N rows, the output of attribution is N noisy labels, where the label is generated via a randomized response algorithm, that is parameterized by both the output domain (with priors) and epsilon. So in this case, each row would be assigned a label that is either 0 or 1, based on the prior likelihood of 0.5% for a conversion, and epsilon (let’s assume that’s 5 in this example).

So IPA would perform attribution and compute a secret sharing of the correct label (either 1 or 0) and with some random probability, swap that label for the opposite.

I’ll probably get this next part wrong… so please correct me!

I think the way “RRTop-k” works in this case is:

0: if the true label is 0 (not attributed), then output 0 with probability 100% (I’m likely wrong here…) 1: if the true label is 1 (attributed!), then return a random value 0 or 1.

If instead we are training an ML algorithm to predict “basket value” for checkout, this would be more complex. Specifically, the domain would be more complex than just {0, 1}, but we could still use IPA to compute priors.

I’m pretty sure I got the math wrong in that last part…

Next, the report collector would combine these noisy labels with their known feature values associated with each ad click, and use traditional ML techniques to train a model. Is that correct?

Let me clarify a few things. There are a few types of classification models we could train:

Binary models (where the label is just 0/1)
Multi-class models (where the label is some enum)
Regression models (where the label is a real number)

With binary models, the best approach we have (in this setting) is just regular randomized response. A prior doesn’t really help because there aren’t really any knobs to turn if your output space is just 0/1, unless you don’t even want to look at the sensitive data because your prior is so good - in which case you’re done.

For the other problems, the general techniques here at a high level attempt to reduce the domain of the label space, which for randomized response causes a decrease in noise. The 2021 paper focuses on multi-class models and the 2023 one on regression models. The prior helps us bucketize / reduce the cardinality of the domain in part by just removing low probability labels from the possible output, or compressing the domain into fewer buckets. This is what RRTopK/RROnBins is trying to do, fundamentally.

The key thing here is that once we have an output domain for a particular example, we just do bog standard randomized response over that restricted domain. So let’s say we reduced a domain (without looking at the label at all) of size D to a subset k for a particular example , the privacy mechanism is just k-RR over that restricted output space:

If the true label is not in the reduced output space: return a random label
Otherwise: with probability k/(k - 1 + exp(epsilon)), return a random label
Otherwise: return the true label

So in your example for the binary case (no prior needed), with epsilon 3, we would pick a random label (0/1) with probability ~10%. This would give the exact same epsilon DP bound as e.g. adding LaplaceDistribution[⅓] to the per-element 0/1 label (which would be add noise with std ~0.47).

I also want to stress that this isn’t necessarily a “different form of DP” from IPA. If I understand the constraints of IPA correctly, as IPA currently doesn’t have anything preventing unique breakdown keys per source. The formal privacy claim should be identical, assuming the noise in IPA satisfies pure DP like Laplace, and that we correctly constrain the per-match-key sensitivity in each case.

FWIW I think @marianapr has thought about possible ways to implement these variants of randomized response in MPC, so I think it should be doable under the security constraints of IPA as well.

Also let me cc @badih whose research I have been representing, who can check me on any errors.

This is an interesting use case, thanks for raising @csharrison. To this specific point:

While IPA currently only mentions “aggregate” queries, as far as I can tell there are no restrictions in the existing design to aggregate only over single events (i.e. a unique breakdown key per source), as long as the output is protected by differential privacy. The mechanism described above offers the same protection.

We've typically thought of "aggregates" as our goal, and differential privacy as a design choice to help accomplish that. However, you're correct that we do not currently propose and specific limit of k on breakdown sizes, and I agree that the end result is one with the same differential privacy properties (even if achieved through different mechanisms.)

The heart of the question, though, asking if a non-aggregated but equivalently differentially private functionality should be supported is actually more of a question for the threat modeling exercise, and not specifically for the IPA team. Operationally, what you describe should be possible in the IPA protocol without major overhaul, in fact I think you can get quite close as is. Similarly, if there were consensus in the PATCG that this is an appropriate functionality, I think the IPA team would be inclined to find a way to support it.

The heart of the question, though, asking if a non-aggregated but equivalently differentially private functionality should be supported is actually more of a question for the threat modeling exercise, and not specifically for the IPA team.

This makes sense @eriktaubeneck , and my goal for this issue was not to litigate this point but rather to offer an alternative mechanism offering equivalent privacy as IPA as-is (as my feeling was IPA was not necessarily opinionated here).

I can file an issue to bring the discussion to the broader PATCG group.

patcg-individual-drafts / ipa

Consider a randomized-response-like privacy mechanism #60