Open raprasad opened 3 years ago
from slack:
@Shoeboxam:
You could insert the exact quantile statistic into the laplace quantile- but your utility will be so poor you might as well just sample some noise. Sensitivity is (M- m), independent of the record count. The most efficient algorithm will be based on the exponential mechanism, but the current formulation requires row-level access. Assuming your record counts are large, you would be better off postprocessing histograms. I think the histogram insertion is pretty well trodden ground at this point. I could add a postprocessing component if you'd like. There may be a middle ground with the exponential, where we can still give a bound on the sensitivity of utilities derived from partially aggregated data. After thinking about this some more, I have a utility function to evaluate bins from an exact histogram, along with a bounded sensitivity. Not sure how it compares to post-processing histograms at this point. Aiming for a middle ground between row-level exponential and postprocessed noisy histograms. The mechanism basically releases the bin index most likely to contain the quantile. (edited)
@joshua-oss:
That would work! I could pass in a vector of histogram bin sizes. I assume we wouldn’t be able to report accuracy
@Shoeboxam:
Yeah. The tricky part about the exponential mechanism accuracy is that it is dependent on the dataset itself. So the accuracy itself leaks information about the dataset
from @joshua-oss: Is there any way to get differentially private quantiles from a set of pre-computed aggregates? Access to underlying rows is not possible. The underlying SQL engines can return arbitrary exact quantiles, and also can return histograms with even or arbitrary bin width.