Closed csharrison closed 1 year ago
Given that people using multiple devices/browsers will now not get the same match key, we can't guarantee the contribution limit per-user. Right?
Given that people using multiple devices/browsers will now not get the same match key, we can't guarantee the contribution limit per-user. Right?
Yeah currently there aren't any measurement proposals that can enforce a sensitivity bound across devices, although it's something that I think we should invest some time into trying to do (e.g. this is possible to some extent with vendor provided match keys). I tried to keep this general so that things like tightening budgets based on estimates of devices per user is a feasible mitigation.
That is, my contribution is, to some extent, hidden by the noise of others' contributions. We don't have a strong formalism for that, but that's a useful intuition that we might be able to rely on. My guess is that this is why aggregated values perform worse: because the noise that the training system experiences is concretely higher when aggregated, even though the formal protections remain the same
This is true for DP-SGD style learning because the noise is applied to the entire gradient in order to keep both the features and labels private, and the gradient can be huge. There could very well be aggregate training techniques in the "label DP" setting which outperform single-event queries, we just don't know of an algorithm yet. Generally speaking aggregation performs better than applying noise to every input because you don't have any noise in others' contributions - you just apply a O(1) noise share to the entire aggregate.
That sounds like the kernel of a solution to me. In the general case, maybe you can't ensure that the gradient (or even a given dimension of it) isn't exclusively the result of the contributions of a single user. But I would be surprised if it wasn't possible to bound the sensitivity to be less than the size of the entire gradient.
It looks like we've come to a settled place on this. I think it is reasonable to merge in at the two weeks point.
Fixes #41
Per the call on May 4 I'll leave this PR open for some time to get feedback.