opendp / opendp

The core library of differential privacy algorithms powering the OpenDP Project.
https://opendp.org
MIT License
332 stars 55 forks source link

Count By (Categories) Known N -- Proof #158

Open andrewvyrros opened 3 years ago

Shoeboxam commented 3 years ago

I anticipate this proof will give the relation d_out >= d_in * 2^(1/p - 1).

If we know N, then there must be as many additions as deletions. Consider the two extreme edit distributions: A: When additions and deletions are all in separate buckets, sensitivity is d_in^(1/p). B: When all additions are made to one bucket, and all deletions to another, sensitivity is (2*(d_in/2)^p)^(1/p) = d_in * 2^(1/p - 1)

Only when d_in = 1, edit distribution A is greater than edit distribution B. But d_in = 1 is unreachable, because d_in is even when n is known. When d_in = 2, the two edit distributions have equal sensitivity for all p > 0. When d_in > 2, edit distribution B dominates edit distribution A because d/dx of d_in^(1/p) - d_in * 2^(1/p - 1) = d_in^(1/p - 1)/p - 2^(1/p - 1) is monotonically increasing for all p > 0. Therefore edit distribution B maximizes sensitivity, assuming the maximum sensitivity is represented by one of the two given edit distributions.

When p = 1, sensitivity is equivalent to unknown n. When p = 2, we have a global sensitivity of d_in / sqrt(2). When d_in = 2, sensitivity is sqrt(2), which is equivalent to the Smartnoise-Core substitution proof. When p = 3, then sensitivity is d_in/2^(2/3). Etc.

This approach is tighter than the previous group-privacy-based symmetric distance known-n sensitivity d_in / 2 * sqrt(2).