opendp / smartnoise-core

Differential privacy validator and runtime
MIT License
290 stars 33 forks source link

underestimated sensitivity on binary histograms with known N #360

Closed Shoeboxam closed 2 years ago

Shoeboxam commented 3 years ago

It is possible to reduce the sensitivity of queries on binary histograms with known N. One estimate can be acquired for category 1, and then let category 2 be N - #cat1. In this case, the histogram becomes a postprocessed, filtered count query. The validator takes advantage of this by reducing the sensitivity to one.

Unfortunately, the runtime does not release 2-category histograms with known length via postprocessing a filtered count query. It releases two parallel queries that can together be postprocessed to reduce the noise-variance of the estimate below the acceptable epsilon-delta bounds.

In the AddRemove metric, the sensitivity happens to be the same without this optimization. In the Substitute metric, the wrongly-optimized sensitivity is 1, when it should be 2 in L1-space, and sqrt(2) in L2-space.