opendp / smartnoise-sdk

Tools and service for differentially private processing of tabular and relational data
MIT License
254 stars 68 forks source link

approx_bounds is not robust to extremum floating-point values, leading to a DP vulnerability in SmartNoise-Synth #574

Closed TedTed closed 1 year ago

TedTed commented 1 year ago

The approx_bounds method in this file fails when infinity or NaN values are passed as input. The following code:

import snsql
from snsql.sql._mechanisms.approx_bounds import approx_bounds

data = list(range(0, 100)) + [float("42.17")]
l, u = approx_bounds(data, 1)
print(l, u)

works fine, but replacing "42.17" by "inf" or "NaN" fails with ValueError: Value [value] is outside of the range we can use to infer bounds.

This bug propagates to SmartNoise-Synth functions exposed to the end users, because this line only checks for NaN values, but not infinity values. So running the following code snippet:

from snsynth import Synthesizer
import pandas as pd

pums = pd.read_csv('PUMS_null.csv', index_col=None) # in datasets/
pums.drop(['pid'], axis=1, inplace=True)
categorical_columns = list(pums.columns)
categorical_columns.remove('income')
categorical_columns.remove('age')
synth = Synthesizer.create('mst', epsilon=1.0, verbose=True)
sample = synth.fit_sample(
  pums,
  categorical_columns=categorical_columns,
  continuous_columns=['income', 'age'],
  preprocessor_eps=0.5,
  nullable=True
)
print(sample)

works perfectly fine on the PUMS_null.csv file in the datasets/ folder, but replacing a single income value in this file by inf will raise the ValueError above.

joshua-oss commented 1 year ago

Thanks for reporting this. It appears that this bug will affect any float value with absolute value larger than 2**64, which is a wide range of what can be represented in float32 or float64. Will fix in approx_bounds to silently filter any values that are outside the range we can measure. To minimize risk of leaking values, we should also remove the assert on line 80.

joshua-oss commented 1 year ago

Fixed in 1.0.1