opendp / smartnoise-core-python

Python language bindings for smartnoise-core.
MIT License
75 stars 11 forks source link

DP quantiles fail with a RuntimeError #70

Open TedTed opened 3 years ago

TedTed commented 3 years ago

Hi folks,

I started from the basic data analysis notebook and wanted to try out quantile computation with the exponential mechanism.

I slightly modified the 10th code cell of the notebook to change:

sn.dp_mean(
    data = sn.to_float(data['age']),
    privacy_usage = {'epsilon': .65},
    data_lower = 0.,
    data_upper = 100.,
    data_rows = 1000
 )

to:

sn.dp_median(
    data = sn.to_float(data['age']),
    candidates = [float(i) for i in range(100)],
    mechanism = "Exponential",
    privacy_usage = {'epsilon': .65},
    data_lower = 0.,
    data_upper = 100.,
    data_rows = 1000
 )

Executing this cell raises the following error:

RuntimeError: Error: node specification ExponentialMechanism(ExponentialMechanism { privacy_usage: [PrivacyUsage { distance: Some(Approximate(DistanceApproximate { epsilon: 0.65, delta: 0.0 })) }] }):
Caused by: custom sensitivities may only be passed if protect_sensitivity is disabled

This probably shouldn't happen (presumably the quantile mechanism should figure out the sensitivity to pass to the exponential mechanism?), and the error message itself is wrong, since passing protect_sensitivity = False to sn.Analysis doesn't solve the issue, but raises a different message:

RuntimeError: Error: node specification ExponentialMechanism(ExponentialMechanism { privacy_usage: [PrivacyUsage { distance: Some(Approximate(DistanceApproximate { epsilon: 0.65, delta: 0.0 })) }] }):
Caused by: sensitivity has 1 records, while the expected shape has 100 records.
Shoeboxam commented 3 years ago

Thanks for raising this issue. Following up on this message, we've been moving away from this library and I just can't get a deprecation notice on it soon enough. The notebook you're using hasn't been maintained and as you've pointed out, there was a regression that caused the median to break. I recommend using the OpenDP library instead. Admittedly, the OpenDP library doesn't have a quantiles implementation yet, but there are a couple different algorithms in development.

Here's a modification to the cell. The runtime error occurs when trying to compute the privacy budget, but you can make a release.

with sn.Analysis() as analysis:
    # load data
    data = sn.Dataset(path = data_path, column_names = var_names)

    # get mean of age
    age_median = sn.dp_median(
        data = sn.to_float(data['age']),
        candidates = [float(i) for i in range(100)],
        privacy_usage = {'epsilon': .65},
        data_lower = 0.,
        data_upper = 100.)

print("DP median of age: {0}".format(age_median.value))
# explodes:
# print("Privacy usage: {0}\n\n".format(analysis.privacy_usage))

Under any other situation I would debug the issue and extend the test suite. But I think we'd be better off if I spent that time opening PRs for this algorithm under the OpenDP library instead.