world-federation-of-advertisers / cross-media-measurement

Apache License 2.0
35 stars 11 forks source link

This project uses insecure noise generation in critical DP primitives #1357

Open TedTed opened 9 months ago

TedTed commented 9 months ago

Describe the bug The code in this folder implements common noise addition primitives necessary to achieve differential privacy. Both primitives — the Laplace and Gaussian noisers — simply use the relevant classes in org.apache.commons.math3.distribution to sample from each distribution.

This almost certainly makes these primitives vulnerable to floating-point attacks, be it the original (10 year old!) attack from Mironov, or more recent versions like precision-based attacks.

Steps to reproduce I have not gone through the trouble of understanding how to run this large piece of software to figure out how to exploit this in practice, so I don't know whether this can be used for full reconstruction or membership inference. At a minimum though, you should assume that the floating-point numbers returned by this piece of the code are leaking much more information than they're supposed to, and that the overall system actually does not satisfy differential privacy.

Component(s) affected This is in the "eventdataprovider" folder. I'm not sure what's there.

Version The issue is present in the latest version of the code. Here's a permalink.

Environment N/A

Additional context For more information about floating-point attacks that don't require you to read a scientific paper, you can check out this blog post or this recorded talk.

kungfucraig commented 9 months ago

Thanks for reporting this. We're looking into it.

kungfucraig commented 9 months ago

@TedTed Curious, I was looking at your article did you test the Google DP library?

TedTed commented 9 months ago

Yes. This vulnerability isn't exploitable in Google DP library, because they generate noise using a discretization the distribution to a power of 2 that depends on the noise scale. This discretization comes at a small privacy cost in the form of an additional δ in the privacy guarantee. This is described in their white paper about this. To my knowledge, that work has never been peer-reviewed, and I have not myself double-checked all the proofs.

kungfucraig commented 6 months ago

To fix the issue we should do the following:

Another way we discussed fixing this was to take advantage of the fact that all usages of these Noiser classes round the outputs to integers, and to just have the Noiser's do this rounding themselves. This is probably fine from a privacy point of view, but we'd need a DP expert to sign off on it.

There are other drawbacks, though. One drawback is that the maintenance task is more complex as all usages would need to be changed. What's more, is that there may be usages we do not know of.

It's also inelegant due to the fact that Gaussian and Laplace distributions are real, and a future developer could relatively easily reintroduce this issue.

Basically we are taking the approach that, like security primitives, we should avoid implementing DP primitives ourselves where possible.