opendp / smartnoise-core

Differential privacy validator and runtime
MIT License
290 stars 33 forks source link

Feature: generic way for callers to request exact answers #330

Open raprasad opened 3 years ago

raprasad commented 3 years ago

From @joshua-oss :

Feature: Generic way for callers to request exact answers.

This is useful in the evaluator, allowing for testing of accuracy bounds. I’m not aware of any other scenarios where it would be useful, beyond tinkering and debugging, and it would obviously be a potential source of abuse. We currently have an internal API on the SQL layer that allows this, but there was a proposal to treat Infinite epsilon as being a request for exact values. Would need some thought about how to expose safely.

raprasad commented 3 years ago

from @Shoeboxam: There is an api for this already- but to use it, you take a performance hit. wn.Analysis(filter_level='all'), keeps all intermediate values in the graph, even non-privatized ones.

You can run, for example, sn.mean(data).value to get the exact statistic. Similar functions are available for all statistics. Another downside here is that it is no longer the same graph- it's a similar graph, but without the mechanism.

Shoeboxam commented 3 years ago

Two potential routes here.

Route 1: Do special-casing in mechanisms. If privacy_definition.strict_parameter_checks are off, infinite epsilon disables noise addition.

Route 2: Introduce an intermediate filter level with one of these behaviors:

  1. a filter level that retains graph sinks
  2. a filter level that retains aggregated data
  3. a filter level that retains certain components that the user flags

Currently, private data are purged from the graph as the computation runs, unless the filter-level is set to retain all.