Open bmcase opened 1 year ago
Thanks for filing this, @bmcase . One thing I want to caution is that your option (5) may be susceptible to "trivial" collusion attacks e.g. a site that merely always sets a "0" match key. This is an attack that comes up often when it comes to "intersecting" device graphs, where there is no way to ensure that a device graph is honestly constructed (and e.g. assumes all devices are shared by one user). When attacks are this easy and we don't have other restrictions on MKPs, the non-collusion assumption gets weaker.
Regarding the use of OPRF matching, the cardinality leak we're talking about still likely applies in some ways. If we are to consider the possibility of webviews and IPA aims to operate at the device level (not at the browser level), a single device manifests as multiple discrete contexts. You cannot use ciphertext stability across contexts, but cardinality might be used to link those contexts.
@csharrison good call out.
One thing I want to caution is that your option (5) may be susceptible to "trivial" collusion attacks e.g. a site that merely always sets a "0" match key. Good point, and it could really be any constant value.
This was ultimately just an early line of thinking around the idea that if the match key could mount attacks, we could possibly use the same principle of non-colluding match key providers. That said, we aren't proposing that any helper parties are appropriate, so we can't over extend the idea to match key providers.
Thanks for writing up this detailed issue @bmcase!
For the sake of making progress on a standard in the PATCG, option 1 seems like a good immediate step, but leaves open the ability to add in match key providers as the standard matures. If we take this approach, though, I do think it's critical to solve for the different variations of same device App/Web source event → App/Web trigger report, and this does add some challenges given the scope of the PATCG (and the W3C.)
Some proposals have also talked about using the platform's "Sign In" feature to sync events across a user's device (with the intent of on-device attribution.) If we attempted to use that same functionality to sync match keys, note that these devices only need to establish some form of a shared state, which can be used to generate consistent match keys. New devices could likely only join these shared state for new epochs, but that's likely a reasonable tradeoff.
This was ultimately just an early line of thinking around the idea that if the match key could mount attacks, we could possibly use the same principle of non-colluding match key providers.
Yes +1, I think the general idea of extending the privacy threat model to assume non-collusion across MKPs is interesting and worth further exploration. I guess my point is that the relaxation only really makes sense with a system that has some baseline trust for participating MKPs.
That said, we aren't proposing that any helper parties are appropriate, so we can't over extend the idea to match key providers.
I'm having trouble parsing this. Can you clarify what you meant here?
For the sake of making progress on a standard in the PATCG, option 1 seems like a good immediate step, but leaves open the ability to add in match key providers as the standard matures.
I generally agree with this. However, I do hope we can spend time on a long term solution which keeps the notion of MKPs in place. This is really the core "magic" of IPA and without it we constrain the possible supportable use-cases.
That said, we aren't proposing that any helper parties are appropriate, so we can't over extend the idea to match key providers.
I'm having trouble parsing this. Can you clarify what you meant here?
Sorry about that, I should probably not attempt to reply while jet-lagged 😅. What I was attempting to say was that, with Helper Party Networks, our idea is that the web platform vendors will be able to approve networks; we don't expect any set of three Helper Parties to meet the conditions required to be a Helper Party Network. Similarly, if we extend the idea non-colluding Match Key Providers, we similarly would need to think about which such groupings of Match Key Providers would reasonably meet the the non-collusion assumptions. I think this is aligned with your comment above:
I guess my point is that the relaxation only really makes sense with a system that has some baseline trust for participating MKPs.
On the other point:
For the sake of making progress on a standard in the PATCG, option 1 seems like a good immediate step, but leaves open the ability to add in match key providers as the standard matures.
I generally agree with this. However, I do hope we can spend time on a long term solution which keeps the notion of MKPs in place. This is really the core "magic" of IPA and without it we constrain the possible supportable use-cases.
+1. There are a whole host of use cases which can only be supported in this way (CTV, for example.) It's certainly our intension to continue working towards the goal of full interoperability, with cross device and cross platform support. I believe there's an argument to be made that one of the values of taking this approach is keeping open optionality for cross-device.
That said, I do still think there are meaningful differences between IPA and the on-device attribution proposals, even without the setMatchKey()
API, and these tradeoffs are still worth considering. For example, (as I mentioned above) it's likely easier to support cross-device within a platform, since there's isn't a need to continually sync state.
@martinthomson to your earlier comment
Regarding the use of OPRF matching, the cardinality leak we're talking about still likely applies in some ways. If we are to consider the possibility of webviews and IPA aims to operate at the device level (not at the browser level), a single device manifests as multiple discrete contexts. You cannot use ciphertext stability across contexts, but cardinality might be used to link those contexts.
That makes sense. Do you think there is a way to at least get a bound on this that we can have for the sensitivity of shard sizes?
@martinthomson to your earlier comment
Regarding the use of OPRF matching, the cardinality leak we're talking about still likely applies in some ways. If we are to consider the possibility of webviews and IPA aims to operate at the device level (not at the browser level), a single device manifests as multiple discrete contexts. You cannot use ciphertext stability across contexts, but cardinality might be used to link those contexts.
That makes sense. Do you think there is a way to at least get a bound on this that we can have for the sensitivity of shard sizes?
I want to make sure I am understanding this concern. Is it assuming a platform-level device ID rather than a browser provided device ID? If so, the platform could enforce a bound here, at the cost of fairness and coverage. (e.g. maintain a set of device IDs, sticky to a set of calling apps). The platform could also passively monitor this and provide aggregate metrics for us to calibrate epsilons.
One other possible solution here involves a stricter privacy budget scheme. Currently, source fanout queries only consume budget for the source site, and vice versa for trigger fanout queries. However, I believe this attack is stopped if source fanout queries consume budget from every impacted site in the query, including the source site and all advertiser sites.
Here's how this works:
While I think this solves the problem listed here, it introduces a lot of potential problems that may not be feasible to fix. For instance, if a report collector can consume budget from many sites all at once, we'd need to protect those sites against exhaustion attacks coming from malicious report collectors.
I wanted to mention another (partial, incomplete) solution to this problem. Imagine a reduced/parallel version of IPA that:
setMatchKey
functionalityIn this setting, I believe this issue is more or less moot.
(2) means that the only replay attacks in scope here are ones where the match key provider is colluding with the advertiser site (i.e. for every trigger event the MKP + advertiser know the cleartext match key).
(3) means that the API is not providing x-site match keys across sites for the purpose of source registration. That means all sources that could be registered (eligible for x-device) need to have the match key known "in the clear" / generated offline.
The combination of (2) and (3) means that if an adversary could issue a replay attack by colluding with an advertiser, they could do so without using the IPA API at all. The downside of course is that total cross-device coverage is naturally reduced.
Attack on privacy budgets using fake sites and malicious MKP
Summary:
I’ve been thinking about the overall threat model for managing the privacy budgets. Last summer for the End-to-End doc I was thinking about similar budget sharing attacks but those attacks at the time seemed preventable, but I was able to crystallize this attack into something we don’t prevent with what we are currently doing.
Background
We have recently been assuming that the associated data to encrypted matchkeys includes: (site, the trigger/source bit, requested MKP, epoch, and previously the RC intended to run the query). Also for source-fan-out queries we’ve been assuming that the Helper Party Network (HPN) will verify that all source events were created on the site that is submitting the query and spending the budget.
Attack
Steps for the attack using a source site:
If we also required encrypted matchkeys to include who would be the RC issuing the query (b.example deciding their events can be queries by a.example but not a1.example,...,a10.example), that would prevent this attack unless b.example is also colluding and requests copies for all the fake sites. But since we assume source and trigger sites will collude in our threat model, this would be an insufficient mitigation in the general case. A similar attack could be done with the MKP using a trigger-site
Potential Mitigations
The IPA team have all been thinking of some potential mitigations to the above attack
Likely we would want to use a combination of the above methods. We think device generated matchkeys, 1), would be a decent MVP and would be similar to the first in-market test we are planning, but we will continue to research ways of supporting a MKP with better security guarantees.
Even without a MKP involved, there is the possibility of more general privacy budget sharing attacks that try to induce a client to submit events for many sites. For example if a site can induce a client to submit events for many sites with the ability to know they all were by the same person, they could carry this attack out on their own. For example, sites might try to do some form of bounce tracking to try and get an event registered as if it happened on multiple sites. There are probably some easy mitigations to bounce tracking and similar abuses like having minimum delays from page load until release of the match key, which could be cut short after user engagement.
Implications for MPC – option for OPRF based matching
If we were to go with mitigation 1) or 2) above, that changes the trust model around what sort of attacks a malicious MKP can carry out. If we carry this same new threat model through the MPC we can get some optimizations. Note that from a privacy/security standpoint we don’t have to make this change to matching; it is just that this becomes an option we could more likely pursue as an optimization.
Overall threat model for privacy budgets
This attack is a particular example of the sort of attacks sites might try to carry out through sharing their budgets. We should expand the threat model document(link) to capture a more robust definition of a privacy budget threat model. Other APIs are susceptible to similar style budget sharing attacks. PCM is known to be vulnerable to a similar style of fakes sites attack, see 2.2.5 of https://mozilla.github.io/ppa-docs/pcm.pdf. ARA has been adjusting its privacy grain and limits to deal with similar style privacy budget sharing concerts, see https://github.com/WICG/attribution-reporting-api/issues/661 and https://github.com/WICG/attribution-reporting-api/issues/725