patcg / private-measurement

A place to discuss Private Measurement
Other
10 stars 0 forks source link

Why are multi-party computation solutions the only ones that should be considered? #4

Open jwrosewell opened 2 years ago

alextcone commented 2 years ago

I don't think there was any indication anywhere that that is the case. We're at a stage of weighing different approaches of which MPC is one.

jwrosewell commented 2 years ago

Martin Thomson's presentation specifically steered the group to the use case of attribution and multi party computation. This issue was raised to find out why.

ekr commented 2 years ago

Well, that presentation is just Martin's opinion (though I happen to agree with it), and much of the presentation was concerned with his arguments for why, so I'm not really sure what the question is.

My expectation is that towards the end of tomorrow we'll discuss which use cases to start with first, so that will be the time to discuss that. I doubt we're ready to decide whether MPC is the right angle yet.

martinthomson commented 2 years ago

To be very clear, what I thought I said (and what I should have said if I didn't) is that systems based on multi-party computation seem the most likely to produce the privacy outcomes that I consider acceptable. They also exist (too many of them, sadly, but the problem of choice is better than the alternatives).

I have not seen any alternative proposal that produces acceptable properties. That doesn't mean that this is not possible, just that systems like PCM (Apple) or event-level reporting (Google) reveal unacceptable amounts of information about the browsing activities of individuals. Specifically, they allow details of the interactions of a user with one site to be made available to a different site. This is, of course, a property that is shared by many - if not all - current attribution systems, especially those that use cookies or link decoration.

michaelkleber commented 2 years ago

Martin, could you pull apart your thoughts about MPC, which is an internal implementation detail, from your thoughts about the information revealed, which is an essential part of the API shape? That is, could you be happy with a non-MPC-based system that only revealed aggregate outcomes?

csharrison commented 2 years ago

+1 to Michael about teasing apart reasons why we like MPC. It would be a good outcome of this group to come up with a trust model classification of various architectures / implementations of an aggregate system. In particular I would be interested in the group's views on non-malicious secure MPC systems which we didn't really have a lot of time to discuss today.

cc @marianapr

ekr commented 2 years ago

What do you mean by "non-malicious secure MPC systems"

ekr commented 2 years ago

There's "only revealed" and "is never available to any single system". I think the latter is the requirement

csharrison commented 2 years ago

What do you mean by "non-malicious secure MPC systems"

I meant something like an "honest but curious" / "semi-honest" security model, which as far as I know is the current security model of IPA.

kimlaine commented 2 years ago

There needs to be protection against malicious clients for sure. At least malicious behavior should be detectable.

chris-wood commented 2 years ago

@kimlaine I think @csharrison's commenting on the server threat model, which currently assumes HbC rather than malicious security, but is something that Ben said they're working to improve. (Sorry if that's pedantic!)

michaelkleber commented 2 years ago

@ekr By "is never available to any single system" I think you mean "We want to design an aggregation system in which no single [malicious|compromised] party can get non-aggregated data". Is that right?

alextcone commented 2 years ago

I suggest we move this discussion to a better scoped Issue. The headline here is plain inaccurate. Continuing to discuss under it normalizes poor behavior.

ssanjay-saran commented 2 years ago

I acknowledge that a specific technology (in this case, MPC) has been proposed to support an advertising measurement use case. I don't think it is productive to question why this solution has been proposed and try to pick apart issues with it. Rather, we should focus on the other discussion around Privacy principles, ensure we've defined what problems we're trying to solve, what the threat model is, and then propose a suite of tech/solutions that could help us move forward. My understanding is that MPC would definitely be one of those.

AramZS commented 2 years ago

I think @ssanjay-saran has very well stated the core of this issue, it will be more useful to document what MPC intendeds to address, why, and why it might be the better approach than another potential approach. As @alextcone notes, it is not the formal position of this group (since we have authored no papers at this time) that MPC is the only solution.

jwrosewell commented 2 years ago

FYI I changed titles to better reflect the content as it developed. Will close after meeting later unless Chairs advise otherwise.

alextcone commented 2 years ago

FYI I changed titles to better reflect the content as it developed. Will close after meeting later unless Chairs advise otherwise.

Changing the name makes the conversation no longer make sense. I would like to raise tonight that the topic of an issue thread not get changed by the author. If you haven't collected your thoughts enough to title an issue correctly out of the gate, that may be a sign to sit with your thoughts and reflect a bit longer.

eriktaubeneck commented 2 years ago

I will add that one of the nice features of MPC is that the IETF is working on the Privacy Preserving Measurement (PPM) spec, which essentially standardizes a subset of MPC solutions (Verifiable Distributed Aggregation Functions (VDAFs)). In my opinion, leveraging other standards like these will both help this group build consensus and help build confidence in the security and privacy properties of solutions developed by this group.

I agree with those above that PPM/VDAF/MPC aren't the only paths available to us, but they are useful work we can build upon.

jwrosewell commented 2 years ago

FWIW @alextcone I was very happy with the question as raised in the meeting but was advised to post offline. I did. I changed the title after reading your prior post to better reflect the content.

martinthomson commented 2 years ago

@alextcone, I just changed the name back (I agree that the discussion stopped making sense under James' new title).

To @michaelkleber's question, the logic is simple:

  1. Requirement: We want to design an aggregation system in which no single [malicious|compromised] party can get non-aggregated data
  2. (Unstated requirements): the system produces useful information; the system does not cost inordinately much; etc....
  3. Analysis: MPC is most likely outcome.

Like @eriktaubeneck, this isn't an absolute position, it's a prediction or even a guess about what is most likely to work. It's not saying that alternatives don't exist, but that they seem less likely to be able to address the requirement.

palenica commented 2 years ago

For the kicks, let me try the following perspective. Say I am from a smaller country somewhere that is not North America. I am told that data is being sent from my browser but not to worry -- my privacy is protected by MPC magic. Then I learn that "MPC" means that my data gets sent to two US-based megacorporations who promise not to collude with each other when executing the MPC protocol. Why should I trust this to be the case? Why can't the US government secretly compel them to collaborate under the pretext of chasing terrorists or tax cheats or whatever? Why would they not colude if they have business incentives to do so? What if the parties are just careless and both get hacked?

From this angle, a TEE (or even plain old hardware) run by people I trust and understand beats a MPC being run somewhere far away by people I have a reason to be suspicious of.

TL;DR: perhaps we should not get too hung up on the TEE vs MPC vs something else distinction -- the context matters.

npdoty commented 2 years ago

@palenica I think privacy concerns regarding collusion and user trust in which parties is totally relevant, and maybe that could be added to the web advertising privacy principles doc (which I believe @darobin volunteered to edit). Not sure we have separate issue or repo for that yet, but that would be a good discussion to continue there.

michaelkleber commented 2 years ago

I've been trying to figure out how to think about MPC and non-MPC systems on an equal footing, and it seems to me that it's not as binary as our in-person discussion depicted it.

@ekr took the position that for a Trusted Execution Environment approach like Amazon Nitro, there is not robust protection against an attacker with physical or side-channel access, so "you need to trust Amazon" — i.e. we need to pessimistically act as if Amazon can observe all the data the TEE processes, can steal the crypto keys the TEE uses, etc.

To make a reasonable comparison, then, where do we expect the MPC helpers are embodied in the physical world? In particular: if a system's privacy requires two non-colluding helpers, then they must be running on two different cloud providers, and that those cloud providers are trusted to be non-colluding as well?

michael-oneill commented 2 years ago

At minimum, 2 or more unconnected entities, in different legal juridictions where there are suitable and mutually recognised data protection and privacy laws in force.

michaelkleber commented 2 years ago

@michael-oneill That is a plausible interpretation of our discussion so far, but I'm not ready to believe it is an economically and structurally reasonable expectation for the API.

michael-oneill commented 2 years ago

Not for the API, but the browser providers implementing it could block reports unless that constraint is met. How the MPC helpers declare it is up for grabs, but probably a .well-known JSON doc on the domains, with eventual legal recognition.

betuldurak commented 2 years ago

@michaelkleber You are referring to different level of economical expectations than that you mentioned yesterday about the "how many zeros", but I am curious how much more do we want to pay for privacy (both with TEEs and MPCs)? It is clear that even doing no heavy crypto/computations, we will double/triple the price due to the storage of same data with MPC. Is it clear what the upper bound is to consider the scalable effort for MPC?

I don't have experience with TEEs myself, but I am aware of its difficulties with small enclaves and scalability from various groups in industry that required significant effort working with Intel engineers etc. These are costs to me. Do you have experiments with TEEs for any of the use cases discussed last two days? If so, it would be super interesting to see for me. I am curious how much more we are willing to pay for TEEs?

darobin commented 2 years ago

There have been several bits and pieces of discussions that should get collected in the principles doc — might we impose on the chairs to set up the repo, since it's an accepted deliverable anyway? (cc @AramZS)

To add one more dimension to the MPC solution space, it might be too difficult or too costly to obtain some properties through purely technical means (collusion-resistance, adversaries tougher than HbC). Conversely, a pure governance-based model might offer too few guarantees (even if trusted, having one big DB of all browsing activity is never an acceptable level of risk). But a hybrid model could use a governance model to provide non-collusion and honesty in support of an MPC system.

@michael-oneill I wonder if a single entity in an EDPB-adequate jurisdiction could be enough to provide guarantees (assuming the entity were required to legally resist). But that's a thorny aspect that would require some pretty in-depth legal analysis.

rmirisola commented 2 years ago

I’d like to propose to break this apart a bit further, and suggest there is a privacy requirement, which calls upon a data security requirement. Here is the proposed template for the privacy standard:

Data can only be processed off-device if those mechanisms have (1) sufficient security guarantees to ensure that any query or access to the data can only result in outputs that are (2) sufficiently privacy preserving.

I believe most people in this group would agree with something similar to the above for some definition of (1) and (2) and that most of the debate we have right now is about what constitutes a sufficient bar for (1) and (2).

I propose that we try to get consensus on the above template, before we try to proceed with definitions of (1) and (2).

Mearca commented 2 years ago

Thanks, Alex & Ben.

  1. I am in agreement with Robin- a hybrid model of governance and tech guarantees with user controls, given that it maybe unlikely to reach A perfect tech solution, makes sense. Not sure if this has already been discussed- a governance/standards body which looks at setting up acceptable rules, acceptable privacy tech, penalties in case of deviant behavior, user compensations etc. could be a better option globally than having to deal with new laws and regulations every year or so.

  2. Does this group have to decide on one acceptable tech like MPC or could there be more than one, like MPC and TEEs for e.g.?

kirangopinath71 commented 2 years ago

Sorry, signed in with the wrong account earlier!

p-j-l commented 2 years ago

This is a really interesting discussion!

I'm also wondering about how things might change over time to iteratively improve the guarantees that we can give. (For example, maybe we rely on governance for some things to start with that we can later include technical solutions for?)

I suppose this brings us back to @rmirisola's point about working out what's sufficient to begin with.

darobin commented 2 years ago

Yes, we've considered governance solutions before. The proposal on the table for that is known as Garuda. It was designed for a partly different use case, so don't worry too much about the details. The important parts are that:

I don't want to oversell it and I'm pretty convinced that if we head in that direction, whatever we build will be different from that first draft. But after a fair amount of research and having presented the system to a number of people who study this type of arrangement, I think it can work. There's more precedent for this kind of commons-based infrastructure management than people realise, too, though not necessarily in a transnational setting processing the data of 4bn people for a half-trillion dollar industry.

jaylett-annalect commented 2 years ago

@darobin while that might (subject to suitable review) be sufficient for people who understand how various risks are being mitigated by the design, I think there's an aspect of @palenica's point which is more about people who won't... who will they consider trusted (whether operating a TEE or relying exclusively on protocol guarantees built in to use of MPC)? Although eventually this reaches beyond the responsibility of this CG or even W3C, our architecture might be usefully influenced if there's any research (including if we could provoke someone to do new studies) on what trusted could look like. We don't want to design and deploy something that in some parts of the world spawns a successful grassroots campaign for everyone to opt out -- if a different approach could achieve a more accepted outcome.

(Also the annoying pedant in me wants to point out that adequacy isn't permanent, and that it is part of an increasing number of non-European laws which means it can't be guaranteed to be either reflexive or transient. I'll try to keep the pedant under control by point out that most of these countries are very likely to confer adequacy on the EU.)