peteroupc / peteroupc.github.io

Pages about my projects
The Unlicense
51 stars 6 forks source link

Review of random variate generation articles for a programmer audience #18

Open peteroupc opened 1 year ago

peteroupc commented 1 year ago

The following pages of mine deal with random variate generation, Bernoulli factories, and exact sampling. They contain numerous algorithms for such sampling.

My audience for these articles is computer programmers with mathematics knowledge, but little or no familiarity with calculus.

I post this issue to seek comments on the following aspects:

Comments on other aspects of these articles are also welcome.

The articles follow:

Letting them know: @lmendo, @PaulSanchez, @maciej-bendkowski

peteroupc commented 1 year ago

Still seeking reviews on my articles on random variate generation.

Again letting them know about my articles in this issue on random variate generation: @lmendo, @PaulSanchez, @maciej-bendkowski, @lcrocker, @dj-on-github .

Shoeboxam commented 1 year ago

Transitioning from this conversation. I was hoping to get some guidance on sampling Gumbel RVs, and now have some questions/observations about specific algorithms on your site (from the linked issue). Thanks!

peteroupc commented 1 year ago

Sampling the probability $\exp(-\exp(-x))$ involves rewriting to $\exp(-\exp(-(m+\lambda)\cdot\mu)$ where $\mu=1$, $m=floor(x)$ and $\lambda=x-floor(x)$. Then build a coin that simulates $\exp(-(m+\lambda)\cdot\mu)$ with those parameters, and use that coin (call it $\nu$) to simulate $\exp(-\nu)$. In the algorithm I give for $\exp(-(m+\lambda)\cdot\mu)$, $\lambda$ and $\mu$ both lie in $[0, 1]$.

The following is a proof of the algorithm $\exp(-(m+\lambda)\cdot\mu)$.

First, suppose $\mu = 1$. Each iteration of the loop in the algorithm returns 0 if a Poisson random variate with mean $t$ is other than 0, where $t$ is $\lambda$ in the last iteration, or 1 otherwise. Since a Poisson random variate is 0 with probability $exp(-t)$, the iteration will terminate the algorithm with probability $1-exp(-t)$ and "succeed" with probability $exp(-t)$. If all the iterations "succeed", the algorithm will return 1, which will happen with probability $exp(-\lambda) \cdot (exp(-1))^m = exp(-(m+\lambda))$. Now suppose $\mu$ is in $[0, 1)$. Then the Poisson variate just mentioned has mean $t\mu$ rather than $t$, so that each iteration succeeds with probability $1-exp(-t\mu)$ and the final algorithm returns 1 with probability $exp(-\lambda\mu) \cdot (exp(-\mu))^m = exp(-(m+\lambda)\mu)$.

Shoeboxam commented 1 year ago

Ah, it's basically the same as Cannone et al, but with a Poisson sampler instead. I was hesitant to choose those parameters because I was concerned about finding the PSRN of frac(exp(-x)). I did have success recreating the algorithm, but I'm finding the pseudocode for the poisson1 sampler in Duchon/Duvignau to be wrong, it's too concentrated to be Poisson. Interestingly, your summary of the algorithm here does work.

Duchon poisson1 test ```python def poisson1(): """Samples from Poisson(λ = 1) https://www.combinatorics.org/ojs/plugins/generic/pdfJsViewer/pdf.js/web/viewer.html?file=https%3A%2F%2Fwww.combinatorics.org%2Fojs%2Findex.php%2Feljc%2Farticle%2Fdownload%2Fv23i4p22%2Fpdf%2F#subsection.5.3 """ import random n = 1 g = 0 k = 1 while True: i = random.randint(0, n + 1) if i == n + 1: k += 1 elif i > g: k -= 1 g = n + 1 else: return k n += 1 def test_poisson1(): from collections import Counter trials = 5000 counts = Counter(poisson1() for _ in range(trials)) from math import exp, factorial print("i: actual ideal") for i in range(5): print(f"{i}: {(counts.get(i) or 0) / trials:.4f} {exp(-1) / factorial(i):.4f}") test_poisson1() ``` ```text i: actual ideal 0: 0.2716 0.3679 1: 0.5174 0.3679 2: 0.1632 0.1839 3: 0.0378 0.0613 4: 0.0088 0.0153 ```
exact exp(-exp(-x)) sampler ```python import random from math import floor def bernoulli(p): """Sample from Bernoulli(`p`) when `p` in [0, 1]""" assert 0 <= p <= 1 return random.randrange(p.denominator) < p.numerator def poisson1(): """Samples from Poisson(λ = 1) Algorithm from: https://www.combinatorics.org/ojs/plugins/generic/pdfJsViewer/pdf.js/web/viewer.html?file=https%3A%2F%2Fwww.combinatorics.org%2Fojs%2Findex.php%2Feljc%2Farticle%2Fdownload%2Fv23i4p22%2Fpdf%2F#subsection.5.3 Implementation from: https://stats.stackexchange.com/a/551576 """ n = 1 g = 0 k = 1 while True: j = random.randint(0, n) if j < n and j < g: return k if j == n: k += 1 else: k -= 1 g = 1 + n n += 1 def bernoulli_exp_neg_01(f): """Sample from Bernoulli(p = exp(-E[f()])) for f() in [0, 1]""" return not any(f() for _ in range(poisson1())) def bernoulli_exp_neg(x): """Sample from Bernoulli(p = exp(-(m + λ))) https://peteroupc.github.io/bernoulli.html#exp_minus__m____lambda_____mu """ m = floor(x) if any(poisson1() for _ in range(m)): return False return bernoulli_exp_neg_01(lambda: bernoulli(x - m)) def bernoulli_exp_neg_exp_neg(x): """Sample from Bernoulli(p = exp(-exp(-x)))""" return bernoulli_exp_neg_01(lambda: bernoulli_exp_neg(x)) ```

At this point I just need to figure out a Bernoulli factory for the fractional part of the exponential PSRN. Thanks again for all your help!

peteroupc commented 1 year ago

Your implementation of Duchon and Duvignau's algorithm (as given in their paper) is implemented incorrectly. random.randint(a,b) generates a uniform integer in $[a,b]$, while Random(m) in their paper "returns a uniform number in [m]", which the paper defines as the set of integers in $[1, m]$. Thus, Random(n+1) should be random.randint(1, n+1).

peteroupc commented 1 year ago

@Shoeboxam : Do you have further comments?

By the way you are encouraged to implement yourself any of the algorithms in "Bernoulli Factory Algorithms" and report on your implementation experience; that's pretty much one of the requests in the opening post. The same is true for the other articles in the opening post. Part of my goal of this issue is to correct any errors or unclear points in those articles.

Shoeboxam commented 1 year ago

@peteroupc Thanks for checking in. I used inverse transform sampling to address the original reason why I was trying to sample Gumbel variates: https://github.com/opendp/opendp/pull/653 I find this approach pretty appealing, because it's simple and also gets me samples from distributions with closed-form inverse CDFs to within an arbitrarily small interval.

You've amassed an incredible collection of very cool algorithms and it's taking some time to take it in. The PSRN Tulap sampler, for example, is neat.

Feedback It is easy to get lost in an algorithm description if it isn't accompanied with a high-level intuition. For my uses, I can write my own proofs, so long as I understand the gist of how it works. Something as simple as pointing out when an intermediate variable is distributed in a certain way is really helpful, or just outlining the overall strategy. Granted, many of your existing algorithms already do this, or have links to papers. I expect I'll be able to see through them quicker with a bit more experience, though.

It is also easy to get lost in the site. I found it surprising there are more Bernoulli factories and PSRNs in "More Algorithms", when you already have pages for them. In my experience so far, the best way to not get lost on the site is to just read the entire site! ;)

Nits:

peteroupc commented 1 year ago

I appreciate your comments. In response I have addressed your "nits" and rearranged the content. In particular, the "More Arbitrary-Precision Samplers" page is now obsolete and its content was moved to other pages in the site.

peteroupc commented 1 year ago

@shoeboxam : For your information, issue #17 lists open questions I have on the articles in this repository, including the pages listed in this issue. One question I've just added that may interest you is: Find additional algorithms to sample continuous distributions with PSRNs that meet the properties desired for such algorithms. Other open questions relate to Bernoulli factories.

peteroupc commented 1 year ago

@Shoeboxam : For your information, the following paper on a security consideration when sampling for information security (and differential privacy) purposes came to my attention. I have updated the security considerations on "Randomization and Sampling Methods" to take it into account.

Ben Dov, Y., David, L., et al., "Resistance to Timing Attacks for Sampling and Privacy Preserving Schemes¨, FORC 2023.