Support for combining reports with different IDs

csharrison commented 1 year ago

Ideally, it should be possible to make aggregate queries as flexible as possible within the system. To that end, it might be useful to consider support for grouping together multiple histograms that otherwise would need to be queried separately.

Here is an example: imagine I am tracking two types of conversions, purchases and “add to cart” events. Usually I query these independently to get breakdowns per type so I give them different Conversion IDs. However, during a particular week I may have a reduced volume of reports, and so I’d prefer to get a histogram combining the purchase and add to cart reports, which allows me to get aggregate data with less overall noise (one noise addition vs. two).

Note that this is the analog to ARA/issues/470 on the ARA side to allow more dynamic decision making.

winstrom commented 1 year ago

I want to make sure that I understand this clearly.

Consider the scenario where the aggregation system is being made more flexible and there is no change to the browser computation.

Combining bins within a single histogram makes sense since the user contributes a fixed amount to that particular histogram. Supporting pulling together sub-bins in order to get larger bin populations before adding noise to the sum would enable better analysis on periods with small numbers of users. This seems very similar to supporting dynamic time windows for data aggregation.

Combining multiple conversions is a more complicated question. Since we consider conversions independently, there is the possibility that publishers and advertisers could collude to link individual advertisements with individual conversions across sites and create a way to pull more cross site information into multiple conversions than would be supported with a single aggregate. This is mitigated by the noise being inserted per conversion histogram and maintaining a per-site privacy budget in the browser. Enabling the combination of these conversion histograms would require a re-budgeting of the central noise being added on any combined bins since they may contain more signal contribution from a single user than in a single conversion histogram.

Are you suggesting something more than new functionality in the protocol for aggregation? Is there a better analysis for this than I've done (since this is my first pass at this question)?

csharrison commented 1 year ago

Yes I am suggesting combining separate histograms, not just combining bins within a single histogram.

The privacy question is a good one. Let me try to answer. First for simplicity consider the case where every user is only allowed to engage with a single ad per time window, and each ad/conversion marks itself as contributing to N separate conversions. Now, the ad tech wants to combine all of these N histograms together.

In the aggregate of the N histograms, you know any given ad contributed at most N times, so it is safe to add noise proportional to O(N). In a way, you can view this as a "combining bins" operation within the "virtual histogram" which is just all N histograms concatenated with each other, and you've bounded the sensitivity of any ad to this "virtual histogram" to N.

This trick works in the more complicated case where you allow multiple ad events per some partition (e.g. <site, time>). For instance, if you have a per-site/day bound of k ad events, then you know your "virtual histogram" in this boundary will have effective sensitivity k N, and you would need to add O(k N) noise.

The trick even works if the sensitivities of the histograms are different (and you still want to combine them for some reason), by just artificially scaling the histograms with smaller sensitivities to match the histograms with larger sensitivities, and then applying the larger noise.

LMK if that makes sense (or if I'm missing something 😄 )

martinthomson commented 1 year ago

The simple combination that I'm imagining here seems fine to me. @csharrison, you didn't specify how the histograms were combined, but I'm assuming a simple mapping. That is, either you just glue the histograms together so that you get a histogram of size $\sum_i |H_i|$. Alternatively, I can imagine a mapping function that takes a bucket in each input histogram and maps to a bucket in an output histogram, the values of which are summed.

The simple concatenation might allow you to use L2 sensitivity, but the full mapping might force you to use L1.

Does this have any potential to combine information from multiple ad impressions that occur on different sites? I couldn't see how, but I'm not particularly smart right now. I think that that is only a risk if the combination method is more complicated than the ones I described.

winstrom commented 1 year ago

I agree with @martinthomson that this simple mapping described by @csharrison will keep the privacy promises made by the browser. So I'm not opposed to allowing this type of combination in the back end.

However, I'm not sure how much utility gain is gotten from combining multiple histograms this way. I haven't thought through this completely but naively if I have $H^1$ and $H^2$ as the two reported histograms and I have a couple mappings into new bins defined by $M^1$ and $M^2$ I can combine them into $H^3$ where $H^3_i = \Sigmaj M^1{i,j} H^1_j + \Sigmaj M^2{i,j} H^2_j$. In order to put noise on this combination I can either add the noise directly to $H^3$ in the MPC system or and I can add noise independently to $\Sigmaj M^1{i,j} H^1_j$ and $\Sigmaj M^2{i,j} H^2_j$, release them, and then combine them after the fact.

Let's call the intermediate sums $H^{3,1} = \Sigmaj M^1{i,j} H^1_j$ and $H^{3,2} = \Sigmaj M^2{i,j} H^2_j$.

The released $i^{\rm{th}}$ bin of these histograms will be either both $H^{3,1}_i +\sigma_1$ and $H^{3,2}_i +\sigma_2$ or $H^3_i +\sigma_3$. Combining the intermediate sums gives $H^3_i +\sigma_1+\sigma_2$. The sum of two random numbers is also a random sample from the convolution of the two sampled distributions $-$ if $\sigma_1$ and $\sigma_2$ are independently sampled from a gaussian of width 1, $\sigma_1+\sigma_2$ is a sample from a gaussian of width $\sqrt{2}$. If $\sigma_3$ isn't sampled from a narrower distribution, I'm not sure you gain anything by doing the sums in the MPC system and releasing the final aggregate.

martinthomson commented 1 year ago

Good point about utility. In your example, given the mappings involved, the sensitivity of $H^3$ will be (or at least could be) double, so I can't see a way that $H^3$ could avoid needing $\sigma_3=2$ rather than $\sigma_3=\sqrt{2}$.

csharrison commented 1 year ago

@martinthomson regarding L1 vs. L2 sensitivity, I was assuming an L1 sensitivity bound in my previous comments.

Regarding utility, with an L1 bound, $\sigma_1$, $\sigma_2$, and $\sigma_3$ should all be drawn from the same distribution to achieve the same privacy guarantee, which means $H_i^3$ will have half the variance. $\sigma_3$ will not need more variance than the $\sigma_1$ or $\sigma_2$ because $H^3$ has the same L1 sensitivity as the concatenation of $H^1$ and $H^2$.

If we are trying to bound using the L2 sensitivity, I think the best we can say is that $H^3$ has at most twice the L2 sensitivity of $H^1$ and $H^2$ combined (by the triangle inequality), and I agree you don't win anything if the noise is drawn from Gaussian.

winstrom commented 1 year ago

@csharrison — can you help me understand this? I think I'm missing something fundamental. You say

Regarding utility, with an L1 bound, $\sigma_1$, $\sigma_2$, and $\sigma_3$ should all be drawn from the same distribution to achieve the same privacy guarantee, which means $H_i^3$ will have half the variance.

And you previously said

In the aggregate of the N histograms, you know any given ad contributed at most N times, so it is safe to add noise proportional to O(N).

I am having trouble wrapping my head around $\sigma_1$, $\sigma_2$, and $\sigma_3$ being drawn from the same distribution but that $\sigma_3$ should be drawn from something with $\mathcal{O}(2)$ noise.

csharrison commented 1 year ago

On mobile right now so will be brief (can elaborate more later if needed). If we imagine $H^1$ and $H^2$ both have sensitivity 1, then both $\sigma_1$ and $\sigma_2$ will both need noise scaled to O(2), because the overall L1 sensitivity across both histograms is 2, matching the sensitivity of $H^3$.

winstrom commented 1 year ago

@csharrison — I'm to sure I'm following. Why would $\sigma_1$ and $\sigma_2$ need noise scaled to $\mathcal{O}(2)$?

csharrison commented 1 year ago

@csharrison — I'm to sure I'm following. Why would σ1 and σ2 need noise scaled to O(2)?

E.g. in the current PAM design, $H^1$ and $H^2$ both are comprised of ad events that are marked as contributing to two histograms (each event contributes one count to $H^1$ and $H^2$). To achieve $\epsilon$-dp per ad event, we will need to add something like $Lap(2/\epsilon)$ to each element in both $H^1$ and $H^2$. If this was not the case, you could adversarially generate $H^1 = H^2$, and average out the noise to get reduced variance.

winstrom commented 1 year ago

There is no guarantee that $H^1$ and $H^2$ are somehow linked together with an underlying ad event because the device doesn't have that context . They could query different ad events that the advertiser has more knowledge about. So the combination of multiple histograms requires additional noise. You make this point above.

csharrison commented 1 year ago

Sorry you're right I was making a simplifying assumption (the "simple case" of https://github.com/patcg-individual-drafts/private-ad-measurement/issues/7#issuecomment-1777181801) where all events are contributing to the same N histograms.

But consider the case where $H^1$ and $H^2$ are composed of distinct events and each is composed of events which can contribute to 2 histograms at most. In that case we also know $H^3$ has an impression-level sensitivity of 2 and $Lap(2/\epsilon)$ is sufficient. There is no ad event in $H^3$ that has participated more than 2 times.

This argument extends beyond the impression-level epsilon as I mentioned in https://github.com/patcg-individual-drafts/private-ad-measurement/issues/7#issuecomment-1777181801.

martinthomson commented 1 year ago

I'm wondering why this needs to be complicated still. If we start from the assumption that $H^1$ and $H^2$ use information that is completely disjoint, then the sensitivity will double from their addition. I would contend that if you allow an arbitrary mapping to $H^3$, then the same applies even if the input events are not disjoint.

I get that maybe there are composition methods that might allow for us to apply less noise than double, but I'm struggling with why you wouldn't just leave that to the caller of the API to untangle with some post-processing.

csharrison commented 1 year ago

I'm wondering why this needs to be complicated still. If we start from the assumption that H1 and H2 use information that is completely disjoint, then the sensitivity will double from their addition.

This is not true, the impression-level sensitivity will not change. Imagine every user only interacts with a single ad event, and each event contributes to either $H^1$ or $H^2$. We know that $H^3$ will have sensitivity 1, just like $H^1$ and $H^2$.

martinthomson commented 1 year ago

But that is not how this system works: every histogram that is produced has a certain sensitivity. I guess that you can maybe say that $H^3$ can be constructed in a way that ensures that it doesn't have increased sensitivity, but that was not the setting that Luke or I was talking about.

If you do want to be able to ask for an aggregate histogram that where only one of the histograms that are aggregated has a contribution, then yes, there is some value in doing this. I just didn't see how you could guarantee that.

csharrison commented 1 year ago

I think I may have misunderstood something, so maybe it would help if you could write out a counterexample where the technique I am proposing won't work? The goal of this issue was to allow to combine histograms like $H^1$ and $H^2$ into $H^3$, where my understanding of PAM is that each histogram is marked with the effective impression-level sensitivity (i.e. we know $H^1$ ad events contribute $s_1$ counts across all histograms, so we need to add $O(s_1)$ noise to it).

I believe if $H^3 = H^1 + H^2$, then $s_3 = max(s_1, s_2)$ if we're using L1 sensitivity.

winstrom commented 1 year ago

Here is my thinking:

The contract between the browser and the aggregator is that if I contribute to a particular histogram, an amount of noise deemed appropriate will be added to that histogram to satisfy the privacy requirements of the browser.

The aggregator does not know who has populated any histogram — in particular it does not have any guarantee that the users contributing to any two histograms are disjoint sets.

So, when two histograms are combined, the amount of signal any user may have made to the resulting concatenated histogram is the sum of the amount that may have been contributed to any individual histogram.

In your original example

Here is an example: imagine I am tracking two types of conversions, purchases and “add to cart” events. Usually I query these independently to get breakdowns per type so I give them different Conversion IDs. However, during a particular week I may have a reduced volume of reports, and so I’d prefer to get a histogram combining the purchase and add to cart reports, which allows me to get aggregate data with less overall noise (one noise addition vs. two).

I would argue that the 'add to cart' and 'conversion' histograms are most likely not disjoint sets (in fact it's closer to one being a proper subset of the other). So many users will have contributed to both of these histograms. Combining the bins would have a double contribution for these users that needs to be accounted for.

Consider defining a single histogram (with twice as many bins) for both 'add to cart' and 'conversion' events. When a user adds an item to the cart, the website can create a first report histogram mapping ad events to the first set of bins and when the user completes a purchase the website can create a second report histogram mapping ad events to the second set of bins. If a website only wants to measure one or the other, the reports can be given twice the privacy budget from the browser since the design will guarantee only a single report.

Designing a system that guarantees somehow that the user population in different histograms is disjoint allows combinations without this worst case type of analysis. But that would require more coordination than is already involved in this proposal.

martinthomson commented 1 year ago

I can't think of a scenario in which $||H^1||_1 + ||H^2||_1 = ||H^3||_1$ either. Even if $H^1$ draws from a completely distinct set of impressions than $H^2$, the same user could have contributed to both, just with different impressions. You would need some sort of additional constraint on the construction of $H^1$ and $H^2$ to keep the sensitivity of $H^3$ from being greater. Maybe that's what @csharrison is looking for. Not a direct mapping, but mapping + capping.

csharrison commented 1 year ago

Thank you both for responding and for your patience. First I want to clarify that any kind of "disjointness" is not required for the scheme to work, I was only explaining that "disjointness" does not break the scheme as @martinthomson suggested in https://github.com/patcg-individual-drafts/private-ad-measurement/issues/7#issuecomment-1788253719. We don't need the system to enforce this kind of constraint (and I agree with @winstrom that for many use-cases these won't be disjoint).

Second, I want to clarify the measure of sensitivity I am primarily talking about, which is impression level sensitivity. i.e. each histogram is marked with the maximum times its impressions can contribute across all histograms[^1]. As far as I understand, this is PAM's primary measure of sensitivity and each query provides an impression-level epsilon DP guarantee, because each histogram is tagged with its impression-level sensitivity. Any user-level bounds will be captured by something like capping the # of impressions so your user-level epsilon is $k * \epsilon_{impression}$ via composition (though there are other ways of doing this). It might help to understand if we're aligned on just this first point, that under an impression-level sensitivity measure $H^3 = H^1 + H^2$ will have $s_3 = max(s_1, s_2)$ before we move on to user-level sensitivity.

The concern you have @martinthomson in https://github.com/patcg-individual-drafts/private-ad-measurement/issues/7#issuecomment-1788436863 is because you are considering a user-level sensitivity analysis vs. just impression level. If we consider impression-level DP + composition I think it will alleviate your concerns (LMK if it doesn't). However, the approach should work in general even with user-level sensitivity, it just requires the system to understand more about the user-level sensitivity (which AFAIU PAM doesn't have this). i.e. if you imagine that $H^1$ and $H^2$ are tagged with the "maximum times users can contribute across all histograms", then you should still be able to make the claim that $s_3 = max(s_1, s_2)$. This would be possible if PAM did something like provided a fixed/hardcoded abstract "contribution budget" per privacy unit like saying all PAM users get 10 histogram contributions / day. In that case $s_1 = s_2 = s_3 = 10$ at the <user,day> level because all histograms would implicitly be tagged w/ this hardcoded budget.

If this still isn't making sense, one question that might help get to the crux is in your example in https://github.com/patcg-individual-drafts/private-ad-measurement/issues/7#issuecomment-1788436863, how would you add noise to $H^1$ and $H^2$ to achieve user-level bounds?

If it is helpful, I can write something out that is more detailed in a document if this is still confusing, and again thanks for your patience.

[^1]: This is the "Maximum times the ad impression can be attributed" in PAM. It is important to note that this field operates across histograms, so if an impression is eligible for 2 attributions, both $H^1$ and $H^2$ that it contribute to will have $s_1$ and $s_2$ at least 2.

winstrom commented 1 year ago

I think that we're coming at this from different angles and trying to guard against different things.

This proposal is not primarily interested in providing an impression-level epsilon DP guarantee.

We want to provide ways to account for the release of information across multiple privacy budgets (e.g. total user level, site level, impression level). They may cross boundaries (e.g. impressions may be used on multiple sites) and have different sizes (e.g. the user may have a larger budget than a single impression does). We try to ensure the information release respects those budgets.

The model of privacy accounting means that a single histogram release ($H^1$) may consume the entire budget of an advertisement. Another histogram ($H^2$) may consume the entire budget of another advertisement. This is still acceptable because the user privacy budget is not exhausted.

There is nothing stopping a bad actor from registering shadow ads and conversions that are not known to the browser or the aggregation system to be linked but are known to the bad actor to be identical.

Combining $H^1$ and $H^2$ with concatenation and allowing recombination of bins before noise addition then creates a vulnerability that violates the impression privacy budget.

winstrom commented 1 year ago

@csharrison — I've been thinking about this quite a bit. I believe that your mental model of having an on-device budget that limits individual contributions does support that within any combination of histograms, a particular user's contributions are bounded by a particular $L_1$ norm. And so it may be that combining histogram bins before aggregation in concatenated histograms doesn't change any of the privacy guarantees.

I do want to get more opinions on this since I can imagine that allowing combinations across time windows and in combination with other proposed changes may create a vulnerability (and I'd like the more paranoid security folks to weigh in). The more complexity gets added the more difficult the reasoning and explainability of any proposal — and the more difficult the implementation of a backend aggregation system.

This particular question seems to me to be focused on the aggregation system rather than the interface between the device and the aggregation backend and so I think is orthogonal to the implementation of how the cross site information is transmitted.

csharrison commented 1 year ago

Looking at this again, I realize I think one of my comments got eaten. I thought I had responded to your post 2 weeks ago Luke, but I don't see the comment now. Thank you for continuing to think about it and I'm glad I think we're coming to an alignment.

I agree we will need to think carefully about aggregating across boundaries / privacy units etc., but we should be able to be rigorous about the boundaries we want to defend and prove that we do indeed defend them. If a bit more complicated privacy proof will yield us much better utility, I will tend to take that bargain, but it's certainly something we can debate on a case-by-case basis :)

martinthomson commented 1 year ago

I now follow the user- vs. impression- level distinction, so I'm happy with that analysis. I even follow the claims about why $\Delta_3 = \text{max}(\Delta_1, \Delta_2)$ on that basis. It seems nice too, in that you can spend more budget to get less noise without changing the underlying guarantees.

The question of combinations over time seems like it has a workable solution. If we consider a budget that periodically refreshes, then we'll need to track multiple budgets over time. Any attempt to use an impression will spend budget from the corresponding time period.

patcg-individual-drafts / private-ad-measurement

Support for combining reports with different IDs #7