stan-dev / docs

Documentation for the Stan language and CmdStan
https://mc-stan.org/docs/
Other
38 stars 112 forks source link

Clarify that sampling statements vs. target += may not always preserve sampling behavior #588

Open jtassarotti opened 2 years ago

jtassarotti commented 2 years ago

Section 7.4 of the Stan reference manual says that using the sampling statement y ~ normal(mu, sigma) and incrementing the target probability directly with target += normal_lpdf(y | mu,sigma) will “both lead to the same sampling behavior” but the sampling statement drops constant terms from the log probability function when adding to the target.

But I believe that it is not necessarily true that you get the same sampling behavior if the sampling statement's execution is conditional on parameter values. E.g. consider the snippet

if(b) {
  y ~ normal(mu, sigma)
}
else {
  target += normal_lpdf(y | mu,sigma)
}

if b depends upon parameter values, then this does not give the same sampling behavior as

target += normal_lpdf(y | mu,sigma)

because in the first snippet we drop the extra additive constant terms only on the subsets of parameter values which cause b to evaluate to true.

Of course, if b depends upon parameter values, there are likely other practical issues with inference for such a model. Nevertheless, it might be good to add a caveat or re-word this statement in the documentation.

(See also discussion here: https://discourse.mc-stan.org/t/target-vs-sampling-does-dropping-constants-preserve-sampling-behavior/29118/)

WardBrian commented 2 years ago

Hi, Joe! The point you make is a good one, but I think this is one of those cases where adding a clarification could risk confusing the majority of users who aren't in this situation, rather than helping the few who may be. I think this is the kind of thing which would be best off as a footnote rather than editing the main text, to keep users on the "happy path" as much as possible.

For what it is worth, I read this section as essentially saying "If you find-and-replace all instances of one with the other the estimated parameters will be the same", but you're right that there are cases where mixing the two makes this statement untrue.

If you have a specific recommendation for wording it's easy to submit a PR by editing the source file here on Github

jtassarotti commented 2 years ago

I agree it may be hard to explain this precisely without confusing people, so I'm not sure what is best to say. Perhaps the simplest thing would be to just remove the clause "Although both lead to the same sampling behavior in Stan,". I can make a PR to that effect if you think it is worth considering.

Side remark: note that the issue arises even without a mixture of the two usages, as in the following example from the discourse thread:

id[i] ~ mixture_distribution();

if (id[i] == 1) {
  x[i] ~ normal(mu1, sigma1);
}
else {
  x[i] ~ cauchy(mu2, sigma2);
}

using ~ here and dropping constants instead of += with constants means that effectively, the prior on id[i] is not really mixture_distribution(). Instead, it's some slight perturbation because the constants that are dropped from normal and cauchy are different. Though again, this is a model that is not presently supported.

bob-carpenter commented 2 years ago

Good point and thanks for bringing this up. The gradients don't change, but the equivalent of the Metropolis accept step will change.

I think it would help to emphasize that:

target += normal_lupdf(y | mu, sigma);

is the same as

y ~ normal(mu, sigma);

So I think we can just drop saying that that the normal_lpdf is the same, as you've shown why it's not the same.

WardBrian commented 9 months ago

@mitzimorris this was the docs issue we mentioned yesterday that should be looked at alongside better doc on the subtleties of writing your own lpdf which calls other lpdfs/lupdfs

avehtari commented 5 months ago

Maybe also add comment to the Finite Mixtures chapter https://mc-stan.org/docs/stan-users-guide/finite-mixtures.html not to mix _lpdf and _lupdf?