opendp / smartnoise-sdk

Tools and service for differentially private processing of tabular and relational data
MIT License
254 stars 69 forks source link

In the SQL rewriter, the implementation of VARIANCE breaks privacy guarantees #203

Closed TedTed closed 4 years ago

TedTed commented 4 years ago

This is a consequence of this other bug but should probably be tracked separately, in case the fix is applied earlier in the rewriting process.

The code to calculate VARIANCE in the SQL rewriter measures the square of each value, but nothing is done to indicate to that aggregation that the clamping bounds should also be squared. So if your initial clamping is [0,50], you'll (I assume) scale the noise by 50 instead of by 50*50=2500.

joshua-oss commented 4 years ago

The noise is added before the value is squared, so I believe this is working as expected. We will review and confirm.

Note that there is a sample of stochastic evaluator over variance at: https://github.com/opendifferentialprivacy/whitenoise-system/blob/master/evaluation/Differential%20Privacy%20Verification.ipynb

TedTed commented 4 years ago

How can the noise be added before the value is squared? The variance formula is a sum of squares; so either noise is added to each individual element before squaring (which would be valid, but extremely noisy, similarly to local DP), either noise is added at the end (in which case the above bug happens).

joshua-oss commented 4 years ago

Oh, right, I was thinking of avg_squared rather than avg_of_square. For the sum of squares we indeed square the noise.

query = 'SELECT age, VAR(age) AS var_age FROM PUMS.PUMS' private = PrivateReader(meta, reader) sq, q = private.rewrite(query) syms = sq.all_symbols() assert syms[1][1].sensitivity() == 100 # sum assert syms[2][1].sensitivity() == 100 * 100 # sum_of_square

We will add some documentation explaining how the rewriter works.

joshua-oss commented 4 years ago

The details of the VAR computation are documented here: https://github.com/opendifferentialprivacy/smartnoise-sdk/blob/master/papers/DP_SQL_budget.pdf