Open monga opened 2 years ago
Imagine a rectangle with width 1 and height 1. The area is 1x1=1. That is the uniform density from p=0 to p=1.
When you do the grid approximation, you turn the continuous density into a discrete probability mass distribution. That is why you are finding the normalization step necessary to get it to sum to 1. But it is still true that Pr(p)=1 for all values of p for p ~ uniform(0,1).
That's fine, thank you. What I want to say, however, is that the code uses two different approaches for the prior
and posterior
arrays. While for posterior ones we can assume the sum(posterior) == 1
invariant, this is not true for prior arrays.
Indeed the invariant is even required by some sampling functions. In python, for example, you could write:
numpy.random.choice(p_grid, size=int(1e4), replace=True, p=posterior)
but
numpy.random.choice(p_grid, size=int(1e4), replace=True, p=prior)
would raise an exception since sum(prior) != 1
. (The R sample
function seems more tolerant)
The last line of slide 61 (https://speakerdeck.com/rmcelreath/statistical-rethinking-2022-lecture-02?slide=61) and in the book R code 3.2 (and R code 2.3) uses a standardization rule different from the one used for prior probability.
As explained by the Overthinking box at page 35 of the book,
prior
is an array of ones, since the important property is that it integrates to one overp_grid
. The sum of the values ofprior
is indeed much greater than 1 (20 in code 2.3, 1000 in code 3.2).The standardization used for
posterior
instead guarantees thatsum(posterior) == 1
, while the integral overp_grid
is less than one.This is not relevant for the shape of the posterior curve, but the asymmetry bothers me. I believe the right statement to use in 3.2 is
then
sum(posterior) == sum(prior)
and both their integrals overp_grid
should be 1.