Geometric variates via binary search

Shoeboxam commented 2 years ago

We should explore sampling geometric random variates via binary search so that the sampling runs in logarithmic time instead of linear time.

The linear time algorithm we currently have runs very slow when the scale parameter is large. We repeatedly flip a biased coin until finding a heads. The number of failed flips is our geometric sample.

However, we can repeatedly split the support at the expectation, by making unbiased coin flips to determine if the sample is in the lower or upper portion of the support. Continue to sample standard bernoullis until a falsey, which narrows the support to a band. I'm not sure how to sample from this band, though. One approach is to reuse our current implementation, but use rejection sampling. This reintroduces time complexity issues. Another approach is to continue working out midpoints of the band. The geometric distribution is nice in that it is memoryless, so cutting it at any point still produces a geometric distribution, but we are working with a censored geometric distribution. It may be that computing the midpoint of the censored geometric introduces floating-point vulnerabilities.

peteroupc commented 1 year ago

For the geometric distribution with small values of p (I assume p is 1 divided by the scale parameter), there is an exact and efficient algorithm published in Bringmann, K. and Friedrich, T., 2013, July. Exact and efficient generation of geometric random variates and random graphs, in International Colloquium on Automata, Languages, and Programming (pp. 267-278). However, the algorithm is not exactly trivial and its description there is not necessarily programmer-friendly.

See my notes on a geometric sampler.

On the other hand, when p is at 1/3 or greater, the trivial algorithm of drawing Bernoulli(p) trials until a success happens "is probably difficult to beat in any programming environment" (Devroye, L., "Non-Uniform Random Variate Generation", 1986, p. 498).

See also rust-random/rand#1062.

Shoeboxam commented 1 year ago

@peteroupc Wow, I wasn't expecting to find you here! I'm really happy to get this message. I actually stumbled on this algorithm and your implementation, and wrote an early draft in rust in April: https://github.com/opendp/opendp/commit/6f816b37a34c98b390c4ca898f8336bb935cba8d

Once it reaches a PR it would include more thorough references. ;)

The main motivation I had for implementing this algorithm is to use it to generate discrete laplacian noise over floats. It is "discrete" because the geometric noise is added to a very fine granularization of a subset of the floats to rationals.

I haven't quite solved it yet, mainly because I think I need a better approach to rationalize floats with smaller constants. My constants in the fraction representation are so large that the algorithm is very susceptible to overflow. So I think either a better rationalization (maybe lossy) or just larger data types will be necessary.

I was anticipating using the existing linear-time sampler when the success probability was high, as you mention, but wasn't sure where to set the cutoff. I figured I might do it with some empirical testing, but the 1/3 bound is useful. I'm very informally using an expectation of 16 trials as a cutoff at the moment.

Shoeboxam commented 1 year ago

@peteroupc As a follow-up to this, I ended up using this paper: https://arxiv.org/pdf/2004.00010.pdf. You might find the samplers in Section 5 interesting, as they accomplish geometric sampling more efficiently than is done here and more simply than is done here.

Do you have any general guidance or thoughts on how one might partially sample from the Gumbel distribution, parameterized by both shift and scale? I have an algorithm that would benefit from this-- the gist is to independently add Gumbel noise to a vector, and release the indexes of the top k parameterized Gumbel samples.

peteroupc commented 1 year ago

A "Gumbel" random variable has density $\exp(-x) \exp(-\exp(-x))$.

The positive half of the variable (without shift or scale parameters) can be generated by rejection from the exponential distribution, as follows:

Generate an exponential random variate, which can be done using the von Neumann algorithm described in Karney (2016) or improved versions that cite it.
Accept the variate with probability $\exp(-\exp(-x))$, or go to step 1 otherwise. There's a Bernoulli factory algorithm for $exp(-x)$, with $x$ separated into its fractional and integer parts, as well as one for $exp(-\lambda)$ from Canonne et al.; the former includes the generation of a Poisson(1) random variate which can be done via Duchon and Duvignau's algorithm.

References:

Karney, C.F.F., 2016. Sampling exactly from the normal distribution. ACM Transactions on Mathematical Software (TOMS), 42(1), pp.1-14
Canonne, C., Kamath, G., Steinke, T., "The Discrete Gaussian for Differential Privacy", arXiv:2004.00010 [cs.DS], 2020.
Duchon, P., Duvignau, D., "Preserving the number of cycles of length k in a growing uniform permutation", Electronic Journal of Combinatorics 23(4), 2016.

Shoeboxam commented 1 year ago

Thanks for the response @peteroupc! This is a promising approach, but I had trouble sampling $Bernoulli(p = \exp(-\exp(-x)))$ with the algorithm you linked. Specifically breaking $\exp(-x)$ exactly into $m$, $\lambda$ and $\mu$ in such a way that $m$ is integral, and there exists factories for $\lambda$ and $\mu$ (as I understand it, $\lambda$ and $\mu$ need not necessarily be fractional). I don't see anything that excludes trivially letting $m = 0$, $\lambda = exp(-x)$, $\mu = 1$ in your algorithm description, but clearly that doesn't lead to a sensible algorithm (it always returns 0). Is there a citation or proof available for this algorithm so I can dive deeper?

I was able to sample $Bernoulli(p = \exp(-\exp(-x)))$ by combining the Cannone et al. sampler for $\exp(x)$, as well as your rendition of the sampler from Łatuszyński et al. Unfortunately, this implementation of the Łatuszyński sampler becomes biased when $x \ge 1$ (when success probability > .69).

Sample Data

```text x, p, observed p 0.0891, 0.4006, 0.4025 0.1789, 0.4334, 0.4284 0.2701, 0.4661, 0.4650 0.3632, 0.4989, 0.5036 0.4591, 0.5316, 0.5381 0.5584, 0.5643, 0.5799 0.6622, 0.5971, 0.5929 0.7715, 0.6298, 0.6268 0.8876, 0.6626, 0.6594 1.0123, 0.6953, 0.5306 1.1477, 0.7281, 0.5785 1.2969, 0.7608, 0.6226 1.4643, 0.7935, 0.6741 1.6564, 0.8263, 0.7185 1.8842, 0.8590, 0.7718 2.1668, 0.8918, 0.7156 2.5448, 0.9245, 0.7901 3.1308, 0.9573, 0.8016 4.6001, 0.9900, 0.9133 ``` ![bias](https://user-images.githubusercontent.com/3630029/222939778-607e05a5-cdde-47cb-b055-c50d071f16c7.png)

Shoeboxam commented 1 year ago

@peteroupc Would you rather I move/continue this discussion in the feedback issue of your repo?

peteroupc commented 1 year ago

Yes you may.

opendp / opendp

Geometric variates via binary search #457