Retrieve differentially private quantiles from a set of pre-computed aggregates

from slack:

@Shoeboxam:

You could insert the exact quantile statistic into the laplace quantile- but your utility will be so poor you might as well just sample some noise. Sensitivity is (M- m), independent of the record count. The most efficient algorithm will be based on the exponential mechanism, but the current formulation requires row-level access. Assuming your record counts are large, you would be better off postprocessing histograms. I think the histogram insertion is pretty well trodden ground at this point. I could add a postprocessing component if you'd like. There may be a middle ground with the exponential, where we can still give a bound on the sensitivity of utilities derived from partially aggregated data. After thinking about this some more, I have a utility function to evaluate bins from an exact histogram, along with a bounded sensitivity. Not sure how it compares to post-processing histograms at this point. Aiming for a middle ground between row-level exponential and postprocessed noisy histograms. The mechanism basically releases the bin index most likely to contain the quantile. (edited)

@joshua-oss:

That would work! I could pass in a vector of histogram bin sizes. I assume we wouldn’t be able to report accuracy

@Shoeboxam:

Yeah. The tricky part about the exponential mechanism accuracy is that it is dependent on the dataset itself. So the accuracy itself leaks information about the dataset

opendp / smartnoise-core

Retrieve differentially private quantiles from a set of pre-computed aggregates #331