Early design review request: IPA

martinthomson commented 1 year ago

こんにちは TAG-さん!

I'm requesting a TAG review of Interoperable Private Attribution (IPA).

IPA proposes a system that enables cross-site attribution. The idea is to provide businesses that use advertising with a way to measure how their advertising is performing without having to rely on tracking. To do this, IPA assigns users with an identifier - a match key - that cannot be used outside of a multi-party compute (MPC) system. The MPC system only executes a specific protocol that has been vetted to ensure that it only provides aggregated information.

Explainer¹ (minimally containing user needs and example code): https://github.com/patcg-individual-drafts/ipa/blob/main/IPA-End-to-End.md
User research: none yet
Security and Privacy self-review²: https://github.com/patcg-individual-drafts/ipa/blob/main/sec-priv-q.md
GitHub repo: https://github.com/patcg-individual-drafts/ipa
Primary contacts:
- Ben Savage (@benjaminsavage), Meta
- Erik Taubeneck (@eriktaubeneck), Meta
- Martin Thomson (@martinthomson), Mozilla
Organization/project driving the design: Meta
External status/issue trackers for this feature:
- https://github.com/WebKit/standards-positions/issues/142
- https://github.com/mozilla/standards-positions/issues/753

Further details:

[x] I have reviewed the TAG's Web Platform Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): PATCG
The group where standardization of this work is intended to be done ("unknown" if not known): PATWG (not approved, draft charter)
Existing major pieces of multi-stakeholder review or discussion of this design: Records of some discussion can be found in the project repository and PATCG minutes.
Major unresolved issues with or opposition to this design: The explainer includes sections that describe a number of open issues. We are planning trials that should help answer some of these.
This work is being funded by: Meta and Mozilla.

You should also know that...

The security and privacy questionnaire covers two key challenges, that I will highlight again here:

This proposal uses information - match keys - that might be used to perform cross-site tracking if the protections in the proposal were to fail. The API allows any web site to request and receive this information from user agents. The proposal includes a number of measures that are designed to protect this information.
The aggregated information that is provided to sites is based on the use of match keys. The use of differential privacy ensures that there is some protection for the contribution of individual users. The design limits the rate at which sites gain this information, so while the amount of information each week has strict limits, over time this limit always increases without bound.

Any conclusions about the privacy properties of the API will depend on an assessment of the adequacy of these protections.

We'd prefer the TAG provide feedback as 🐛 open issues in our GitHub repo for each point of feedback. We're happy to engage with general feedback, commentary, and questions in this thread; we expect some feedback to be very broad in nature.

hadleybeeman commented 1 year ago

Hi @benjaminsavage, @martinthomson and @@eriktaubeneck. We are looking at this in our W3CTAG meeting today.

Two questions for you.

User cases. Could you please list out the use cases you are designing for, from an end-user's perspective? You mention several throughout the explainer (authentication, measuring the impact of advertising), but it's hard to tell which you are designing for.
How does this relate to other approaches? You mention them in the acknowledgements section, but they appear to be competing proposals. How does this differ and how do you anticipate this all to make sense from a developer's point of view?

eriktaubeneck commented 1 year ago

Hi @hadleybeeman, glad to hear that you will be looking at it today! Let me do my best to answer your questions, but please follow up if you need any clarity.

We are designing for the use case of measuring the impact of advertising based on cross site behavior. Previously this was supported via shared cross site context (3rd party cookies.) We believe this is an important use case to support in a manner that doesn't enable tracking end-users individually, so that end-users's have access to sites which are ad supported that may otherwise require payment or require end-users to provide PII that can be used for tracking.
The Private Advertising Technology Community Group (PATCG), where IPA has been proposed, has published a fairly comprehensive overview of alternative proposals. I'm not sure I understand the question about "mak[ing] sense from a developer's point of view", but the general goal in the PATCG is to recommend a common standard (IPA or otherwise) which can be standardized and implemented across browsers so that developers have a only one common API that supports this use case.

ShivanKaul commented 1 year ago

Some general feedback and thoughts:

I’m not seeing the user benefit here: the use case pointed out (“measuring the impact of advertising based on cross site behavior”) is very explicitly a use-case website developers have, not end users (with reference to W3C priority of constituencies). Users already have choices regarding preventing Web tracking; most user agents already block 3rd party cookies by default.
I also think this has the potential to be actively harmful for users: for e.g. this from the Explainer is concerning: “If a match key provider is able (and willing), they could extend this even further by performing user-level linkage to other contexts (e.g., email based matching with offline merchants), then distribute encrypted match keys, enabling businesses to bring offline user activity from these other contexts into the MPC. The impact this may have on the overall ecosystem is not obvious. On one hand, it may drive an increase in sharing of PII between parties in an effort to gain access to this new measurement capability.” Plus, the overall complexity of this is problematic, which is largely hidden from users; ISTM that when the system fails, user privacy is harmed in a way that is invisible to users. Similarly…
What all can the user verify in this system regarding their privacy? Can they verify that the privacy budget is being respected, for e.g.?
This proposal will prioritize and further lead to the consolidation of the Web in favour of large browser vendors. MPC-based systems tend to be expensive to operate, and if IPA is “standardized and implemented across browsers so that developers have a only one common API that supports this use case” then my concern is that would be effectively ruling out smaller user agents who can’t afford to pay for such a service.

martinthomson commented 1 year ago

I just want to pick up on this one point:

effectively ruling out smaller user agents who can’t afford to pay for such a service

In IPA, websites (advertisers, publishers, etc...) pay, not user agents.

ShivanKaul commented 1 year ago

It is not clear what the relationship is between the User Agent Vendor and the three Helper Parties, except for the fact that the User Agent must trust the Helper Parties to not collude. Given that (by design) 2 parties colluding would be disastrous for user privacy, there's a strong incentive for the User Agent Vendor to operate one of the Helper Parties.

martinthomson commented 1 year ago

Operating a helper node is something Mozilla has considered, but we're not inclined to do so for a few reasons. Foremost of those is that we're looking to use something like the CA/Browser forum as a reference model for governance. That is, we want to have a common set of helper party networks that are trusted and overseen by a group formed by multiple browsers. In other words, having browsers oversee the operators of those networks. Having the browser involved both in operation and oversight would introduce some fairly gnarly conflicts of interest that seemed best to avoid.

As for the other points you make @ShivanKaul:

Regarding users vs. sites (point 1). Yes, this is directly acknowledged in the explainer. It's come up a number of times in PATCG meetings specifically in the context of the priority of constituencies. This is something that I recognize that each of us weight differently, but we'll note that the priority is necessarily loose, so there are a few ways that I think you can justify doing something like attribution.

The magnitude of the benefit here needs to be considered. The IPA design deliberately imposes a very low cost on users. Leaving aside trivial amounts of bandwidth and compute, the primary cost is the privacy loss (in the formal DP sense) that accrues through providing sites with the ability to perform aggregated attribution. Mozilla's position here is that - provided that we can find an acceptable set of parameters, especially for the $\epsilon$ and $\delta$ in the $(\epsilon, \delta)$-differential privacy - this cost is acceptable for general web browsing cases. That is, acceptable if the corresponding advantage provided to web sites is significant. And for that I think that the case has been made by those in the advertising industry: measurement capabilities provide enormous benefit in terms of being able to profitably run an advertising business.

Again, we acknowledge that benefits that users see are likely to be indirect, at best. Access to ad-supported content is not automatic here. The advertising industry has some pretty bad incentive structures and it might be that the current trend away from ad-supported content will continue, with the benefits to users not be realized. But we do believe that advertising has demonstrated an ability to provide support to sites that can be more equitable than other business models as it largely shifts the burden on to those who are more willing or able to support advertisers. A progressive taxation system, if you will.

Ultimately, there are a lot of things to consider here. It's understandable that you might distrust the advertising industry. There are a lot of shady practices that probably won't stop as a result of us building this stuff. Many actors are unhappy with the share of revenue taken by intermediaries (what's new). Hell, some of that will probably get worse despite our efforts, but that is a risk for a lot of the stuff we build. We could, as I think you are implying, refuse to do anything here, but there are those of us that think that leads to undesirable outcomes, like a far less equitable web. What we are trying to do here is to avoid the worst pitfalls and build safeguards for the rest, technical if possible, procedural and policy-based for the gaps.

You identify a few areas that are particularly challenging for IPA. Some of its flexibility comes with inherent trade-offs around things like user transparency. You also identified one area that continues to be challenging for us with the point you make about match key providers. All I can really say is that these represent some of the harder trade-offs we've made in the design. Having some more discussion about these choices relative to some of the alternatives might be the best way to proceed, because some of those choices can be hard to rationalize without putting them into the broader context. I also want to acknowledge explicitly that the context I'm talking about includes not only how sites receive support, but how browsers support themselves. (You might also add bad behaviour from information brokers and regulatory interventions into what is turning out to be pretty complicated.)

ShivanKaul commented 1 year ago

Sorry for the late response, you know how IETF weeks are...

but we'll note that the priority is necessarily loose, so there are a few ways that I think you can justify doing something like attribution.

I don’t think the Priority of Constituencies is “loose”; it’s plain, and the exceptions that are listed in the “Web Platform Design Principles” document are unrelated to what’s being proposed here.

The IPA design deliberately imposes a very low cost on users. Leaving aside trivial amounts of bandwidth and compute, the primary cost is the privacy loss (in the formal DP sense) that accrues through providing sites with the ability to perform aggregated attribution.

While the (formal, DP) privacy loss for users in IPA is definitely something we should reason about, I suspect that it is also the more attractive one to solve for us as engineers; the more important user concerns here are i. around transparency and trust, and ii. piercing the privacy boundary of the browser by intentionally linking events that happen outside the browser with events that happen within the browser.

The proposed governance model is especially concerning to me: it looks like we’re building complicated and expensive new Web infrastructure/governance structures here, similar to the CA/Browser forum like you mentioned, except that with IPA, there is not even a security or any other similar benefit to users. I really don’t think CAB is the model to be emulating. This is the first W3C proposal (we’re aware of) that requires the use of trusted, non-user auditable centralized servers for privacy protections. Beyond the clear privacy risk for catastrophic harm here (e.g., misconfigured server), this approach seems incompatible with several TAG findings / W3C principles, including “enhancing individuals control and power”, “the web is transparent” and “the web must make it possible for people to verify the information they see”.

This proposal has the goal of intentionally linking behaviors in the browser with behaviors outside the browser. This is a new category of privacy harm that the proposal would enable, and the first time we’ve seen it as an explicit goal in a proposal. This has already resulted in attacks like https://github.com/patcg-individual-drafts/ipa/issues/57.

As best we can tell, this technology is being proposed to benefit sites and browser vendors, and at the risk to users and the openness and transparency of the platform as a whole.

martinthomson commented 1 year ago

Regarding priorities and "loose", I was loosely referring to this important qualification:

Like all principles, this isn’t absolute. Ease of authoring affects how content reaches users. User agents have to prioritize finite engineering resources, which affects how features reach authors. Specification writers also have finite resources, and theoretical concerns reflect underlying needs of all of these groups.

That said, even a strict ordering justifies our conclusion, though it requires acknowledging that some benefits are indirect. That is, the indirect benefit to users as a result of serving the needs of authors (again, via an ability to more effectively support their work with advertising) outweighs or is neutral with the loss associated with those users participating in an aggregated measurement system. And the benefit to authors is potentially significant.

benjaminsavage commented 1 year ago

Thank you everyone for the feedback thus far.

I wanted to update the group about a change that we have recently made to the IPA proposal.

In light of both:

The risk of events in the browser being linked to events outside of the browser (a risk called out by @csharrison, which we added to the IPA end-to-end doc)
The attack @bmcase discovered, and posted about (https://github.com/patcg-individual-drafts/ipa/issues/57), which could be waged by a malicious match key provider

We've opted to remove the setMatchKey API from this proposal. Perhaps, in future, we will find solutions to these problems, but until that time, we would like to explore a simpler proposal which only includes a getEncryptedMatchKey() API.

The underlying identifier being secret shared in this case would just be a random number, generated by the user-agent, which would never be revealed to any party, just stored on the device.

We hope this simplification will address a number of the concerns listed above.

ShivanKaul commented 1 year ago

To clarify, Match Key Providers (and their associated API call, set_match_key()) are being removed from IPA, thus removing the cross-device measurement use-case. Is that correct? That would help with point ii. of:

... the more important user concerns here are i. around transparency and trust, and ii. piercing the privacy boundary of the browser by intentionally linking events that happen outside the browser with events that happen within the browser.

It would also be good to update the Explainer then.

rhiaro commented 7 months ago

We talked about this today during our call, and it's our understanding that there is a promising path forward to merge IPA, PAM and the relevant portions of ARA. Given that, we don't think it's prudent to review the details of IPA since this is subject to change.

We're happy to see these attempts to converge on a way of measuring advertising effectiveness that is more privacy preserving. We encourage you to keep fine-tuning the privacy properties of your proposals, and then to open a new design review request when it's ready and we'll take a look then. Thanks!

bmcase commented 7 months ago

That makes sense. More details on this hybrid proposal are forthcoming. Once there has been time for more discussion of it, we'll open a new design review.

w3ctag / design-reviews

Early design review request: IPA #823