opendp / PSI

Private data Sharing Interface
Apache License 2.0
4 stars 5 forks source link

Large negative values released in histogram #48

Closed tercer closed 5 years ago

tercer commented 6 years ago

A user reports large negative values inside a histogram release:

I'm playing around with the PSI tool using the online interface and the California Demographic Dataset. I requested a data release with the following setting: image-1 I got the following answer:

Splash Result Page

Global Values

Epsilon: 0.5

Delta: 0.000001

Beta: 0.05

Data Size (n): 1000

Variable 1: age

Histogram Releases: 4, 0, 0, 591, 900, 1029, 1021, 1146, 1063, 937, 761, 611, 496, 407, 375, 316, 216, -8933, 60, 0

The -8933 count surprises me. The way I read the parameters, the Error=11.98 means that I'm the probability of getting an absolute error greater than 11.98 is 5%. I would assume then that the probability of getting an error of at least 8933 is astronomically small. Am I misunderstanding something? Is there a bug?