opendp / smartnoise-core

Differential privacy validator and runtime
MIT License
290 stars 33 forks source link

smartnoise core, v0.3 #342

Closed raprasad closed 3 years ago

raprasad commented 3 years ago

Unit tests


reproduction:

mkvirtualenv test-samples
git clone [samples repo]
pip install -i https://test.pypi.org/simple/ opendp-smartnoise-core
pip install pandas seaborn z3
pip install notebook
jupyter notebook
raprasad commented 3 years ago

errors

covariance, step 3

histograms, step 2

mental health, step 4, same as histogram

TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'`

raprasad commented 3 years ago

reconstruction attack

Error: node specification SnappingMechanism(SnappingMechanism { privacy_usage: [PrivacyUsage { distance: Some(Approximate(DistanceApproximate { epsilon: 0.1, delta: 0.0 })) }] }): Caused by: custom sensitivities may only be passed if protect_sensitivity is disabled

Error: node specification SnappingMechanism(SnappingMechanism { privacy_usage: [PrivacyUsage { distance: Some(Approximate(DistanceApproximate { epsilon: 0.1, delta: 0.0 })) }] }): Caused by: custom sensitivities may only be passed if protect_sensitivity is disabled

Error: at node_id 15 Caused by: node specification LaplaceMechanism(LaplaceMechanism { privacy_usage: [PrivacyUsage { distance: Some(Approximate(DistanceApproximate { epsilon: 0.1, delta: 0.0 })) }] }): Caused by: Floating-point protections are enabled. The laplace mechanism is susceptible to floating-point attacks.

Shoeboxam commented 3 years ago

The covariance issue is because the DP estimate is noisy, which can make coefficients negative. Remember to update the text at the bottom of the notebook that interprets the DP estimate when you run the notebook again.

The histogram notebook gives the error: Caused by: Floating-point protections are enabled. The laplace mechanism is susceptible to floating-point attacks. Add protect_floating_point=False as an argument to sn.Analysis(...). This is the same issue for mental health and the reconstruction attack. These notebooks are explicitly releasing dp estimates from a mechanism known to have floating point vulnerabilities.

The resize notebook gives the error: Caused by: lower bound on the statistic is unknown for the snapping mechanism. Either pass lower as an argument or sufficiently preprocess the data to make a lower bound inferrable. This is because the snapping mechanism is not passed bounds on the Sum statistic. Even better, use this snippet:

    with sn.Analysis() as analysis_plug_in:
        data = sn.Dataset(path = data_path, column_names = var_names)
        age = sn.to_float(data['age'])

        dp_mean = sn.dp_mean(
            data=sn.to_float(data['age']),
            privacy_usage={"epsilon": 1.0},
            implementation="plug-in",
            data_lower=0.,
            data_upper=100.)

    dp_plugin_mean = dp_mean.value

Instead of manually constructing the estimate as is currently done in the notebook, setting the implementation to "plug-in" does this for you. This implementation also postprocesses the dp count internally to estimate the missing bounds for the snapping sum.

You may notice banding in the scatter plot and uneven density curves. These are artifacts of the snapping mechanism.

raprasad commented 3 years ago

Notebooks fixed