google / differential-privacy

Google's differential privacy libraries.
Apache License 2.0
3.08k stars 353 forks source link

Maven artifact with LongBoundedSum #242

Closed sakkumar closed 11 months ago

sakkumar commented 1 year ago

Please release a new maven artifact with LongBoundedSum.

Is there any specific reason BoundedSum is supposed to not take maxContributionsPerPartition param?

monsieurmuffin commented 1 year ago

Hi,

Thank you for the question! We are currently discussing the timeline for the next release. I will share the details as soon as I get more information.

BoundedSum uses parameters lower and upper instead of maxContributionsPerPartition. lower and upper are the bounds on the total value contributed by a user to sum. For example, if you know that a user can contribute a value between 5 and 10 at most 3 times, then you should set lower to 5 (this is the minimal total value that can be contributed by a user) and upper to 30 (= 3 * 10).

sakkumar commented 1 year ago

Thanks for reply!! So in the above example, lower & upper bound is defined perPartition and the user can still have > 1 crossPartition contribution. is my understanding correct? If so, my question was why not allow bounding of (5, 10) and let BoundedSum do the clamping and addition and use sensitivity of (perPartition crossPartition 10) for noise addition. I am trying to understand if it would cause any issue.

monsieurmuffin commented 1 year ago

Hi, I am very sorry for the delayed response. We will be discussing the exact release timeline next week.

"So in the above example, lower & upper bound is defined perPartition and the user can still have > 1 crossPartition contribution. is my understanding correct?" - yes, this is correct.

Re. the per-partition contributions API design, we have 2 options:

  1. We could either set contribution bounds on the individual contributed values plus a bound on the number of values that can be contributed per partition. In our example, it would be contributionLower = 5, contributionUpper = 10, maxContributionsPerPartition = 3.
  2. Or we can set a bound on the lower and upper total contribution, let the API caller pre-aggregate the per-partition contributions and rely on the assumption that each user contributes to a partition once (meaning that it can be in fact an aggregate over multiple contributions). In our example, this would mean setting totalContributionLower = 5, totalContributionUpper = 30.

In both cases, the user can contribute to multiple partitions, this will just multiply the sensitivity by maxPartitionsContributed.

We opted for [2] because it is more flexible: in some cases we might know the approximate bounds on the individual contributions, but we might know that e.g., the user almost certainly won't contribute the max value each time. In this case, we would set totalContributionUpper to a value that's lower than contributionUpper * maxContributionsPerPartition.

If we know contributionLower, contributionUpper and maxContributionsPerPartition, we should be able to mapp them to totalContributionLower and totalContributionUpper.