opendp / dpcreator

Web application that makes data releases that satisfy differential privacy using the OpenDP Library
MIT License
20 stars 4 forks source link

Integrate: answer to question "can observations be made public" into preprocessors and release text #295

Open raprasad opened 3 years ago

raprasad commented 3 years ago

See google doc: https://docs.google.com/document/d/1xUihcjh4zmfnhG0-2EC-uG-qzpde8WXphRksB0NvHe8/edit#

(Redo steps below after doc discussion)

ecowan commented 2 years ago

There are two avenues here, each with its own set of logical steps:

Using DP Count:

  1. When the user selects private count = True, then the "create statistic" view should be pre-populated with a row for a DP count, the result of which will be passed into any other statistics that the user selects

  2. If the user selects private count = True and in "create statistic" selects a count, it should override the pre-populated one - we only need this to be calculated once.

Using User Estimation:

  1. One of the views (likely create statistic) needs a way for the user to specific their best estimation for the count, which is then passed to the backend and used in the computation chains.

  2. If a DP Count is also requested, then we would need to decide which takes precedence.

@raprasad @ekraffmiller

Thanks to @Shoeboxam for the discussion

ecowan commented 2 years ago

Needed for computing DP counts:

  1. Select any one of the columns in the data set
  2. Set a parameter (epsilon/10, etc.) that determines how much budget should be used to calculate the count estimate
  3. Construct a new class with similar functionality to ValidateReleaseUtil that can return a DP count only
  4. Result of this class needs to be passed into ValidateReleaseTool to be used in the resize step of each statistic
  5. ValidateReleaseUtil also needs to lower the maximum_epsilon based on how much was used by the DP count
raprasad commented 2 years ago

An old slide. We're not getting user input--yet.

This ticket is for implementing the green box labeled: "Use privacy budget to capture size"

2022-0525-iqss-dataflow_-_Google_Slides
ecowan commented 2 years ago

@raprasad Why don't we approach this incrementally, and first build a feature where the user has to answer yes. This way, we can first develop the part of the code that takes the estimate from the front end and passes it into the process. Once this is merged, we can add functionality for the case where they say "no".

ecowan commented 2 years ago

Another option is to create 2 analysis objects, one for the dp count and one for the rest, and split the budget between them. This way we could reuse the existing ValidateReleaseUtil class to compute what we need, rather than creating new classes to compute the dp count separately.

The workflow could look like this:

  1. User selects "count is private"
  2. Make two API calls to create new analyses, and link them to each other
  3. When dp count analysis completes, save the dp count to the analysis object
  4. When the second analysis runs, look to the linked analysis object and take the dp count from it