nature-of-code / noc-book-2

The 2024 edition of The Nature of Code with p5.js. Includes Notion workflow and build system.
https://natureofcode.com
1.19k stars 82 forks source link

Efficient alternative for Example 0.5: An Accept-Reject Distribution #982

Open Banyc opened 5 months ago

Banyc commented 5 months ago

I am still learning math so there could be factual errors!

Let suppose the range of the output is in $[0, 1]$. We could just multiply it with $a$ if we want to extend its range to $[0, a]$.

The PDF of the distribution implied by the example is

f(x) = \begin{cases}
  2x  & 0 \leq x < 1 \\
  0 & \text{otherwise}
\end{cases}

Its CDF is

P(X < x) = \begin{cases}
  x^2  & 0 \leq x < 1 \\
  0 & \text{otherwise}
\end{cases}

According to the universality of uniform distribution

Let $F$ be a CDF which is a continuous function and strictly increasing on the support of the distribution. This ensures that the inverse function $F^{-1}$ exists, as a function from $(0, 1)$ to $\mathbb{R}$. We then have the following results

  • Let $U \sim \text{Unif}(0, 1)$ and $X = F^{-1}(U)$. Then $X$ is an r.v. with CDF $F$.
  • Let $X$ be an r.v. with CDF $F$. Then $F(X) \sim \text{Unif}(0, 1)$.

In this case, $F(x) = P(X < x)$, which is indeed a continuous function that strictly increases.

The inverse of $F$ is

F^{-1}(x) = \sqrt{x}, \quad 0 \leq x < 1

Apply the $U$ to $F^{-1}$ and we have

X = F^{-1}(U) = \sqrt{U}

The resulting code is

Math.sqrt(random(1))
sidwellr commented 1 month ago

For "still learning math", you know a lot more than a typical Nature of Code student! The accept-reject algorithm may not be efficient, but it is easy to understand and to customize without having to understand probability theory. I think it is the right level for the Nature of Code.

That said, while reading the custom random distribution section, I also wondered how to do it more efficiently, and came up with the same result. So at least two people are interested in more advanced Nature of Code topics! Probably more. It might be useful to have a forum where Nature of Code students could share insights like this, answers to exercises, and ecosystem projects, as well as help each other and generally socialize.

dipamsen commented 1 month ago

Very relevant video from Stand-up Maths from 2 weeks ago on the topic: https://www.youtube.com/watch?v=ga9Qk38FaHM

shiffman commented 1 month ago

Ahhh, this is so great!!! Thank you @dipamsen for referencing the Matt Parker video, fantastic! Right now the coding train Discord (https://thecodingtrain.com/discord) is probably the best place for this kind of discussion! But I'm open to other ideas and platforms! I know there are many readers who might appreciate learning more about this probability distribution! I'll leave this issue open for further discussion!

sidwellr commented 1 month ago

Since this is open for further discussion, I made a p5.js sketch to compare the three methods (accept-reject method described in NoC, sqrt(r) described above, and max(r,r) from Stand-up Maths). The results are, of course, the same, but it shows the times, which are not. (When making it, I discovered that the p5 max function is very slow for some reason, so I used Math.max instead.)

https://editor.p5js.org/rsidwell/sketches/voUAa3iOo

Banyc commented 1 month ago

@sidwellr Thank you for I have learned the idea from your demo to build a real-time statistics! I think "speed" is a bit misleading; latency per operation sounds more accurate.

sidwellr commented 1 month ago

@Banyc You are correct; speed is something (usually distance) measured over time. I changed the label from "speed" to "time", which is a more accurate description. Thanks.