Open jgabry opened 3 years ago
Hi Jonah,
I agree the warning doesn't make much sense for this model.
It's happening because x ~ normal(0, sigma);
is identified as a prior on sigma
. That's happening because:
x ~ normal(0, sigma);
may as well be sigma ~ normal(0, x);
.data
variables. Unfortunately, data
variables aren't always what the user considers the "data". In this case, you might be considering x
the data - if it could be identified as data, then x ~ normal(0, sigma);
would indeed be marked as a "likelihood".This is clearly a confusing warning sometimes. Maybe we could:
Interesting, thanks for the explanation! That makes sense. And indeed you're right that if I move x
to the data block I don't get that warning anymore.
What about only considering something as "having a prior" if it's in the parameters block and (appears on the LHS of a statement with ~
OR appears to the left of |
in a target += statement)? In my example that would only count 1 prior for sigma
but I'm probably overlooking a reason why that wouldn't be sufficient (I haven't really thought it through) or that might be too different than how the analysis currently works.
Also, I definitely picked a strange model in that it has no data, so I'm not super concerned about this, I just thought it was strange. But your explanation makes sense.
I see what you mean. That's actually already implemented in a separate warning. If you run:
parameters {
real sigma;
}
model {
sigma ~ exponential(1);
sigma ~ exponential(1);
}
you get:
Warning at 'twiddles.stan', line 5, column 2 to column 25:
The parameter sigma is on the left-hand side of more than one twiddle
statement.
(Looking back, I don't like that language much!) The reason why I hesitate to use only that one is that Stan is so much more flexible than its builtin distributions.
One improvement would be to rule out a factor as a prior for a parameter if it's a builtin distribution and the parameter is the right of the |. That would solve this scenario without limiting scenarios with non-builtin distributions.
Another improvement might be to consider the twiddle analysis when writing the error messages, e.g.: if there were multiple "priors" but not multiple LHS "twiddles", add some language about how the analysis might be counter-intuitive when the data is misidentified.
The reason why I hesitate to use only that one is that Stan is so much more flexible than its builtin distributions.
Yeah that's a good point. That said, I would bet that most users turning on pedantic mode will be using builtin distributions most of the time. But you're totally right that Stan is much more flexible than that and many users don't limit themselves to just the builtin distributions.
One improvement would be to rule out a factor as a prior for a parameter if it's a builtin distribution and the parameter is the right of the |. That would solve this scenario without limiting scenarios with non-builtin distributions.
Another improvement might be to consider the twiddle analysis when writing the error messages, e.g.: if there were multiple "priors" but not multiple LHS "twiddles", add some language about how the analysis might be counter-intuitive when the data is misidentified.
Good ideas. Not super urgent or anything like that, but if those aren't too hard to implement then I think they would be worth it.
@rybern Using the latest CmdStanR from github I was trying out pedantic mode (which is really cool!) and I got an unexpected warning:
The second two warnings make sense but the first one about sigma having 2 priors isn't correct. There's only one prior on sigma. Is this a bug?