rmcelreath / rethinking

Statistical Rethinking course and book package
2.1k stars 596 forks source link

Eliminating variable breaks ulam model #399

Open hillegass opened 1 year ago

hillegass commented 1 year ago

If I create a model with a sigma variable all is good:

m2 <- ulam(
  alist(
    sales ~ normal(mu, sigma),
    sigma <- 0.9+0.08*TV^0.7,
    mu <- a + b*TV^0.4,
    a ~ uniform(-2,3),
    b ~ normal(1.4,0.2)
  ) ,
  data=tv_data,
  chains=4, log_lik=TRUE
)
print(WAIC(m2))

Gets me reasonable values for WAIC:

      WAIC     lppd penalty  std_err
1 975.5667 -486.389 1.39432 16.27619

Exact same thing, but with no sigma variable is bad:

m2 <- ulam(
  alist(
    sales ~ normal(mu, 0.9+0.08*TV^0.7),
    mu <- a + b*TV^0.4,
    a ~ uniform(-2,3),
    b ~ normal(1.4,0.2)
  ) ,
  data=tv_data,
  chains=4, log_lik=TRUE
)
print(WAIC(m2))

gets terrible WAIC scores:

      WAIC      lppd  penalty  std_err
1 602747.6 -102496.4 198877.3 50347.07

Directory for reproduction is attached:

cd Bugreport
R -f good.R
R -f bad.R

Bugreport.zip

hillegass commented 1 year ago

I should have mentioned the versions: I am using rethinking 2.21, rstan 2.26.13, and cmdstanr 0.5.3

rmcelreath commented 1 year ago

Thanks for the very clear report! The bad model isn't subsetting the TV variable inside the normal() with [i], but only in the generated quantities. If you inspect the stancode() for both models, you will see what I mean. So the estimates are the same I guess? But it's just the WAIC calc that goes wrong.

I've been thinking that what ulam() should do in these cases is make a symbol for the calculation of sigma. i.e. automatically convert the "bad" model to the "good" one. That would solve a lot of parsing issues.