yuxiangw / autodp

autodp: A flexible and easy-to-use package for differential privacy
Apache License 2.0
265 stars 53 forks source link

Amplification with sampling without replacments is throwing following error. #33

Closed SaitejaUtpala closed 2 years ago

SaitejaUtpala commented 2 years ago

Hi everyone,

When doing gaussian mechanism amplification by sampling without replacements it is throwing AssertionError: mechanism's add-remove notion of DP is incompatible with Privacy Amplification by subsampling without replacements. Here is the code snippet to reproduce the error. Is there anything that I am doing wrong ?

subsample = transformer_zoo.AmplificationBySampling(PoissonSampling=False)
mech = mechanism_zoo.GaussianMechanism(sigma=0.1)
prob = 0.1

SubsampledGaussian_mech = subsample(mech,prob,improved_bound_flag=True)
yuxiangw commented 2 years ago

It is exactly what the error message says. Sampling without replacement only works for the version of DP where the neighboring relationship is ReplaceOne. You can get it to work be adding mech.neighboring = 'replace_one' after Line 2.

The above fix applies to the most recent version of autodp on the master branch. If you have to use older version of autodp for some reason, add `mech.replace_one = 1' instead.

Please do note that the when you use the replace_one neighboring relationship to define DP, then the sensitivity might not be the same as that of the add/remove. You should make sure what you describe in autodp matches your actual implementation.

SaitejaUtpala commented 2 years ago

@yuxiangw Thanks for the quick reply. I always thought DP is only for fixed dataset sizes (replace one) . can you forward any relevant papers that talk about this different kinds of neighboring relations ?

yuxiangw commented 2 years ago

Standard DP assumes adding / removing (see the Dwork / Roth book). The replace_one version is called "Bounded DP" by Vadhan. It is convenient in some cases (e.g., releasing mean) but is slightly restrictive in other cases (it does not protect datasize n. And it doubles the sensitivity in some cases).

image
SaitejaUtpala commented 2 years ago

Got it. Thanks!