In Stan, we can circumvent this issue by marginalizing out the indicator variable z.
I guess there are a lot of steps compressed in this sentence but it is not explained further in the text. Perhaps it could be possible to provide some more info on what marginalization of z means in this context and why it solves the problem of not being able to use categorical z. Later you write that the math behind this is more complicated but perhaps even a paraphrase would help to get an intuition about how or why it works.
It is unclear to me if in Stan z will take real values different than 1 or 0 and this is why marginalization matters?
Is small theta a parameter in the model? will it get estimated by the model?
What is the relationship between capital theta and capital theta1 and capital theta2?
You mentioned that in the class but it still might be worth explaining why exactly the formulation in (20.1) has more parameters (n parameters) than the marginalized formulation. I guess it would be useful to go step by step from Bernoulli distribution maybe on an example? showing that z is a parameter that disappears in the marginalized version.
I guess you didn't want to go into such details because it is just a technical detail but I find it difficult to understand w/o these additional explanations.
I guess there are a lot of steps compressed in this sentence but it is not explained further in the text. Perhaps it could be possible to provide some more info on what marginalization of z means in this context and why it solves the problem of not being able to use categorical z. Later you write that the math behind this is more complicated but perhaps even a paraphrase would help to get an intuition about how or why it works.
It is unclear to me if in Stan z will take real values different than 1 or 0 and this is why marginalization matters?