Exchangeability and sampling

This is just to clear my head on Ch 5 of the Bayesian book since I feel it's little bit confusing here. Please let me know if I am going to a wrong way of understanding

Suppose we have 8 divorce rates from 8 states of US, y1 to y8.

1. The state names corresponding to each y_i is not given.

We will be given the values for y1, y2, ... ,y7 to estimate y8, and y8 is our interest to estimate.

y_i values can take any values between 0 to 1 and we do not have further information to differentiate them. So we can claim that they are exchangeable. i.e., p(y8) = p(yi) for i = 1,2,...,7. p( ) denotes probability density function.

we can assign a distribution to p(y_i), whose support is [0,1] such as Beta distribution.

now we are given 7 values for y1, ... , y7.

Then posterior predictive density function of y8, p(y8|y1, ... ,y7) is going to be affected by the range and mean values of y1, ... , y7.

But the posterior predictive density function of y8, p(y8|y1, ... ,y7) is still exchangeable. That is, if we randomly switch and reassign the 7 given values, the posterior distribution is the same. eg. if we are given the same 7 known values for y2, ..., y8 and want to estimate y1, p(y1), prior density is still the same with p(y8) given before, and also p(y1| y2, ..., y8) is the same with p(y8|y1, ..., y7) from previous example.

y_i is exchangeable in both prior and posterior way. But they are not independent since p(y_i) != p(y_i| the other 7 y values given).

2. The states names for 8 states are given, but not assigned specifically.

There are Utah and Nevada, and former has lower rate and latter has higher rate.

when 7 values for y1, ..., y7 are not given,

We want to estimate p(y8). p(y_i) is still the same for all i = 1, ... , 8. We can say (y1, ... , y8) is exchangeable.

But the given information (the names of 8 states and Utah and Nevada included) changes prior: now it will have wider tail due to one lower and one higher rate.

Now you are given 7 values for y1, ..., y7 and they are the same values from case 1. They are similar to one another. This gives you an idea that y8 could be either Utah or Nevada.

p(y8| y1, ... , y7) may have multimodal distribution; if y_8 is Utah, it may have lower rate and if y8 is Nevada, it may have higher rate.

is p(y8) still exchangeable? is p(y8) = p(y1) = ... = p(y7) ? Probably yes, because still we are not given any matching between names and y_i's. p(y8) can be any states, as well as any other p(y_i)s.

3. we are given that y8 is the divorce rate for 'Nevada'.

Even before we see the 7 observations corresponding to y1, ..., y7, we know that prior of y8, p(y8) cannot be exchangeable since p(y8) will have higher density then all the other states. posterior predictive distribution, p(y8|y1, ..., y7) is going to have heavy mass over max(y1, ..., y7).

c.f.

De Finetti's theorem: given that (y1, ..., y8) are exchangeable, the theorem argues that p(y) is a mixture of independent and identical distribution. i.e., p(y) = p(y_i|\theta)p(\theta) over \theta.

tom-hc-park / MSc-RA-Bayesian-evidence-synthesis