Open raprasad opened 3 years ago
There are two avenues here, each with its own set of logical steps:
Using DP Count:
When the user selects private count = True, then the "create statistic" view should be pre-populated with a row for a DP count, the result of which will be passed into any other statistics that the user selects
If the user selects private count = True and in "create statistic" selects a count, it should override the pre-populated one - we only need this to be calculated once.
Using User Estimation:
One of the views (likely create statistic) needs a way for the user to specific their best estimation for the count, which is then passed to the backend and used in the computation chains.
If a DP Count is also requested, then we would need to decide which takes precedence.
@raprasad @ekraffmiller
Thanks to @Shoeboxam for the discussion
Needed for computing DP counts:
An old slide. We're not getting user input--yet.
This ticket is for implementing the green box labeled: "Use privacy budget to capture size"
@raprasad Why don't we approach this incrementally, and first build a feature where the user has to answer yes. This way, we can first develop the part of the code that takes the estimate from the front end and passes it into the process. Once this is merged, we can add functionality for the case where they say "no".
Another option is to create 2 analysis objects, one for the dp count and one for the rest, and split the budget between them. This way we could reuse the existing ValidateReleaseUtil class to compute what we need, rather than creating new classes to compute the dp count separately.
The workflow could look like this:
See google doc: https://docs.google.com/document/d/1xUihcjh4zmfnhG0-2EC-uG-qzpde8WXphRksB0NvHe8/edit#
(Redo steps below after doc discussion)
2. update the StatSpec class (stat_spec.py) to include a variable indicatingis_dataset_size_public
3. ^ update the computation chains for existing stats appropriately.e.g. if theis_dataset_size_public == True
, update the chain, use a different chain, etc.include tests for each stat. (Check taht if the dataset size is private then more epsilon is used, etc.4. Integrate into larger workflow. e.g.ValidateReleaseUtil.build_stat_specs()
ValidateReleaseUtil.__init__
: add self.is_dataset_size_public = NoneValidateReleaseUtil.run_preliminary_steps
: set self.is_dataset_size_public to True or FalseAdd functionDatasetInfo.is_dataset_size_public()
similar toget_dataset_size()
except finds answer to the dataset question withinDepositorSetupInfo
ValidateReleaseUtil.build_stat_specs()
, userself.is_dataset_size_public
when building the StatSpec objects