Closed TedTed closed 4 years ago
The noise is added before the value is squared, so I believe this is working as expected. We will review and confirm.
Note that there is a sample of stochastic evaluator over variance at: https://github.com/opendifferentialprivacy/whitenoise-system/blob/master/evaluation/Differential%20Privacy%20Verification.ipynb
How can the noise be added before the value is squared? The variance formula is a sum of squares; so either noise is added to each individual element before squaring (which would be valid, but extremely noisy, similarly to local DP), either noise is added at the end (in which case the above bug happens).
Oh, right, I was thinking of avg_squared rather than avg_of_square. For the sum of squares we indeed square the noise.
query = 'SELECT age, VAR(age) AS var_age FROM PUMS.PUMS' private = PrivateReader(meta, reader) sq, q = private.rewrite(query) syms = sq.all_symbols() assert syms[1][1].sensitivity() == 100 # sum assert syms[2][1].sensitivity() == 100 * 100 # sum_of_square
We will add some documentation explaining how the rewriter works.
The details of the VAR computation are documented here: https://github.com/opendifferentialprivacy/smartnoise-sdk/blob/master/papers/DP_SQL_budget.pdf
This is a consequence of this other bug but should probably be tracked separately, in case the fix is applied earlier in the rewriting process.
The code to calculate VARIANCE in the SQL rewriter measures the square of each value, but nothing is done to indicate to that aggregation that the clamping bounds should also be squared. So if your initial clamping is [0,50], you'll (I assume) scale the noise by 50 instead of by 50*50=2500.