stan-dev / loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
https://mc-stan.org/loo
Other
149 stars 34 forks source link

p_waic warnings in hierarchical models #32

Open davharris opened 8 years ago

davharris commented 8 years ago

As I understand it, the warning p_waic > 0.4 is there to highlight when the model is over-parameterized, in the sense of approaching one degree of freedom per data point.

My question is whether this is still a concern when rows in the log_lik matrix each correspond to many data points (e.g. if the information criterion is "focused" on predicting new blocks rather than new data points).

If my understanding above is correct, then it seems like it wouldn't be a problem, but I was hoping for confirmation. I checked Vehtari, Gelman, and Gabry 2016 as well as Gelman, Hwang and Vehtari 2013 and didn't find any specific discussion of this issue, so I just wanted to make sure I wasn't missing anything. Thanks so much.

jgabry commented 8 years ago

Hi David, yeah probably less concerning, but hopefully @avehtari can chime in here. Aki did the experiments/simulations that motivated these p_waic warnings and I'm not sure if they included scenarios like the one you're describing.

avehtari commented 8 years ago

@davharris: That's a good question. Unfortunately, I don't know the answer and I have to think about this. The problem is that when you predict in blocks, you are making a joint prediction instead of several univariate marginal predictions. Similarly as performance of IS gets worse with more dimensions, the performance of truncated Taylor series approximation gets worse and WAIC works well only if the higher terms in the Taylor series are negligible. That's why we probably can't replace 0.4 with 0.4 times the number of observations in the block. To avoid this problem, I recommend to use PSIS-LOO and check khat values which make sens also if predicting blocks of data.

davharris commented 8 years ago

Okay, I think I understand. Thinking about it from the perspective of importance sampling in high dimension helped. Some of the khat values were also on the high side, which makes sense from that perspective.

Too bad, I was hoping that summing over multiple data points would help, but you're right that the variance can get big when we logSumExp over anything noisy. Oh well.

Thanks @avehtari!