Open jtassarotti opened 2 years ago
Hi, Joe! The point you make is a good one, but I think this is one of those cases where adding a clarification could risk confusing the majority of users who aren't in this situation, rather than helping the few who may be. I think this is the kind of thing which would be best off as a footnote rather than editing the main text, to keep users on the "happy path" as much as possible.
For what it is worth, I read this section as essentially saying "If you find-and-replace all instances of one with the other the estimated parameters will be the same", but you're right that there are cases where mixing the two makes this statement untrue.
If you have a specific recommendation for wording it's easy to submit a PR by editing the source file here on Github
I agree it may be hard to explain this precisely without confusing people, so I'm not sure what is best to say. Perhaps the simplest thing would be to just remove the clause "Although both lead to the same sampling behavior in Stan,". I can make a PR to that effect if you think it is worth considering.
Side remark: note that the issue arises even without a mixture of the two usages, as in the following example from the discourse thread:
id[i] ~ mixture_distribution();
if (id[i] == 1) {
x[i] ~ normal(mu1, sigma1);
}
else {
x[i] ~ cauchy(mu2, sigma2);
}
using ~
here and dropping constants instead of +=
with constants means that effectively, the prior on id[i]
is not really mixture_distribution()
. Instead, it's some slight perturbation because the constants that are dropped from normal
and cauchy
are different. Though again, this is a model that is not presently supported.
Good point and thanks for bringing this up. The gradients don't change, but the equivalent of the Metropolis accept step will change.
I think it would help to emphasize that:
target += normal_lupdf(y | mu, sigma);
is the same as
y ~ normal(mu, sigma);
So I think we can just drop saying that that the normal_lpdf
is the same, as you've shown why it's not the same.
@mitzimorris this was the docs issue we mentioned yesterday that should be looked at alongside better doc on the subtleties of writing your own lpdf which calls other lpdfs/lupdfs
Maybe also add comment to the Finite Mixtures chapter https://mc-stan.org/docs/stan-users-guide/finite-mixtures.html not to mix _lpdf
and _lupdf
?
Section 7.4 of the Stan reference manual says that using the sampling statement
y ~ normal(mu, sigma)
and incrementing the target probability directly withtarget += normal_lpdf(y | mu,sigma)
will “both lead to the same sampling behavior” but the sampling statement drops constant terms from the log probability function when adding to the target.But I believe that it is not necessarily true that you get the same sampling behavior if the sampling statement's execution is conditional on parameter values. E.g. consider the snippet
if
b
depends upon parameter values, then this does not give the same sampling behavior asbecause in the first snippet we drop the extra additive constant terms only on the subsets of parameter values which cause
b
to evaluate to true.Of course, if
b
depends upon parameter values, there are likely other practical issues with inference for such a model. Nevertheless, it might be good to add a caveat or re-word this statement in the documentation.(See also discussion here: https://discourse.mc-stan.org/t/target-vs-sampling-does-dropping-constants-preserve-sampling-behavior/29118/)