roxanageambasu commented 9 months ago

Agenda+: On-device DP budgeting

@bmcase and I have been working on a design for on-device differential privacy (DP) budgeting for private attribution measurement systems such as Apple’s PAM and potentially Google’s Summary ARA (if it were configured with a user-level type of guarantee). One key aspect that we’re discovering is that systems that do on-device budgeting cannot be properly cast under the traditional DP definition, but rather, their behavior is much better captured by another form of DP, called individual DP (IDP, often called personalized DP). We would like to introduce IDP to the community, justify why we think it’s a good way to frame the DP desideratum for systems that do on-device privacy budgeting, and describe the implications of this framing. In particular, IDP framing enables certain types of optimizations that would not be possible in a traditional DP setting. But it also comes with some negative consequences, such as the need to keep the privacy budgets private. We think that these preliminary findings would be useful to communicate within PATCG, and we would also like to request the community’s input regarding our framing and its implications.

Time

30 min (15 for presentation and then discussion).

We would like to put this on the agenda for Thursday.

Links

We will share a slide deck prior to the presentation.

AramZS commented 9 months ago

Added you to day two (Thursday)

bmcase commented 9 months ago

Copy of Roxana's slides for the presentation https://docs.google.com/presentation/d/1bkZdqEiwXTLCy0TfbeBgJL7mccKBGuy2ws_vwpdLlbY/edit#slide=id.p

roxanageambasu commented 8 months ago

Hi all,

Thank you for the discussion and for your feedback on the presentation about an individual DP (IDP) formulation for on-device budgeting systems. Your feedback will help us improve our communication of these concepts for future interactions.

I wanted to give a more in-depth response to some of the questions that arose in the Thursday session, which might help clarify some of the high-level points we were making there.

How Summary ARA fits with respect to IDP

From a cursory look at the documentation of Summary ARA (not the implementation!), it looks that Summary ARA may indeed be operating in an IDP setting and leveraging individual sensitivity to avoid deducting budget for triggers with no source. This is actually one of the optimizations that I mentioned in my talk, which in my opinion, can only be justified through an IDP formulation. This optimization is fine from a guarantee perspective, as long as available privacy budgets are kept private. In one reporting strategy of ARA (trigger_context_id-based reporting), this seems to be the case, because reports are sent unconditionally (so even when there is no attribution and even when out of budget). In a second reporting strategy of ARA, involving randomized reporting delays and dummy reports, further privacy analysis may provide some justification, but it is unclear to me.

Regardless, in my opinion, having a proper theoretical formulation of the DP desideratum would be very useful, in order to obtain a more systematic view on what mechanisms are needed to achieve the desideratum, what their requirements are, and what are good starting points for designing these mechanisms based on existing literature. As I’ve argued, I think IDP is a great fit for formulating the DP desideratum for this kind of system. First, it tells you precisely what’s happening when you want to deduct less privacy on some devices based on the data they have (or don’t have): you are using individual sensitivity to compute a data-dependent privacy loss; that’s it, no "oddity" here, everything is well defined mathematically. Second, it tells you what the requirements are if you ever want to do that, including that you must hide the remaining privacy budgets. Third, it tells you that this mode of operation with "hidden privacy budgets" will be pervasive in your system, so there is a huge need for additional mechanisms to be built into the system so advertisers/publishers can deal with the pervasive default-value reporting. It also helps you identify mechanisms from prior literature as starting points to tackle this issue. Finally, it helps you tap into other opportunities for optimization, which you might not even think to incorporate off-hand, b/c they are perhaps not as intuitive as the idea to not deduct budget when there is no relevant data. I described some other examples of optimization opportunities based on individual sensitivity in my talk.

Relationship between IDP and DP

One can always take the supremum of the IDP big-epsilon function to derive a DP guarantee from an IDP one (see my slides or Proposition 3.2 in the POPL’15 paper). So, in answer to @csharrison’s question on whether Summary ARA (at least the version that always sends a report) could still claim it satisfies traditional DP, the answer is YES, but the question is for what epsilon? If you are willing to assume that there is a reasonable maximum across the individual global budgets enforced by all devices participating in ARA (those ε_Gⁱ ’s from my talk), then you can take that maximum (call it ε_G) and you’ve got yourself an ε_G-DP guarantee! The problem with that is that I don’t know how you can assume some reasonable maximum in an open system such as ARA (or PAM for that matter)… What happens if the user on device j configures ARA to, say, ε_G^j=100000? Does that mean that the ARA system is now 100000-DP? That doesn’t sound right: surely, devices with more reasonable settings of ε_Gⁱ are better protected than that! This is intuitive, but then again, how do you express that with a system-wide DP guarantee? Maybe one option is to just settle on formulating a guarantee only for those users who do not change the fixed ε_G value you encode in ARA. You exclude all the other users and refuse to make a privacy claim for them; but for the users with unchanged ε_G, you claim an ε_G-DP system-wide guarantee. That would be valid. But what if there is a different version of ARA running on some devices, with a slightly higher default value ε_G2 (maybe an older release of ARA)? Does this mean that those users don’t get any protection? Of course not! But they certainly don’t get the ε_G-DP protection you are now advertising for the system. If you go to finer and finer granularity, you realize that what you really would like to be able to say is that each device i gets the DP protection that their own ε_Gⁱ configuration affords. ← That’s IDP.

IDP says that, from the perspective of device i, the individual DP guarantee is ε_Gⁱ, regardless of what the other devices’ guarantees are. This is what @bmcase referred to when he mentioned that IDP provides "isolation between users." And this is why I said that I don’t believe it’s meaningful in ARA-/PAM-type systems to only internally operate under IDP but then externally always translate to a single, system-wide DP claim. In this setting, where privacy budgets are managed by the user devices themselves, I really think you want the power to tell each user, individually, what guarantee they get, based on the configuration on their device (which they could even adjust based on their privacy comfort levels!). There’s something beautiful and powerful in being able to say that to users, and IDP lets you do that cleanly, with well-defined mathematical formalism backing your claims.

As a parenthesis, I wanted to also point out that IDP has recently become popular in purely centralized settings (e.g., NeurIPS’21), where both the DP query executor and the DP budget management happen in a centralized trusted party. There, you can reasonably think of IDP as just an internal mechanism (to optimize privacy budget consumption), but with a very clean translation from IDP to DP. Because the centralized party does all the budget management, it can reliably enforce a single, reasonable-valued, global budget (call it ε_G) on each individual record. This means that the system enforces big-epsilon-IDP with big-epsilon(i)=ε_G, which translates cleanly into an ε_G-DP guarantee. As an example, if IPA were interested in doing IDP internally (for optimization of budgets, same as the above paper), then it could do the translation to DP very cleanly! But for on-device systems, I am pressed to see how we would make that translation to a system-wide ε_G with a reasonable value (and also why we would want to do that, as it seems what we would want is actually what IDP gives us…).

Terminology and Communication of the Guarantee to the Users

DP is notoriously difficult to explain to regular users, but we’re starting to find ways to do so. IDP is much newer, so we really haven’t even started exploring communication to a wider audience. This is why, for example, the group privacy property of IDP is known to hold (it is implied from the much more general Lemma 3.3. of the POPL’15 paper), but it’s hard for a non-expert to see that. Moreover, the fragmented and inconsistent naming only adds to the communication challenges right now. In my experience, this fragmentation is not unusual in academia while concepts are being developed. I’ve seen this happen before, including in the space of ML attacks a few years ago. Initially, there are lots of names, different definitions, different problem formulations, different goals, etc. Then, one or two papers come along with a more mature/synthesized look at the problem and set a more consistent framework upon which further literature then builds. This convergence hasn’t happened yet for IDP, and the way I see it, this gives us an opportunity to establish that – provided that we agree that, from a technical perspective, IDP is the right definitional framework for these on-device systems, to both offer users meaningful privacy guarantees and enable systems to manage budgets efficiently.

@bmcase and I, along with a team of researchers at Columbia University and University of British Columbia, are planning to formalize our proposal of IDP-based on-device budgeting in the coming months, and to submit an academic paper for peer review. In that paper, we intend to take on the challenge of both articulating IDP and its properties for a wider audience and motivating why it is a good fit for on-device budgeting systems.

patcg / meetings

Agenda Request – On-device DP budgeting #166