rmcelreath / rethinking

Statistical Rethinking course and book package
2.14k stars 603 forks source link

log_lik fails with matrices in models fit by ulam #432

Open jebyrnes opened 5 months ago

jebyrnes commented 5 months ago

I was trying to compare the no correlation, brownian, and OU models in the phlogenetic section, but, ran into issues with the log_lik argument

library(rethinking)
library(dplyr)

data(Primates301)

dstan <- Primates301 |>
  mutate(name = as.character(name)) |>
  filter(!is.na(group_size),
         !is.na(body),
         !is.na(brain)
  )

# for matrices
spp_obs <- dstan$name

# A list of data
dat_list <- list(
  N_spp = nrow(dstan),
  M = standardize(log(dstan$body)),
  B = standardize(log(dstan$brain)),
  G = standardize(log(dstan$group_size)),
  Imat = diag(nrow(dstan)) )

no_phylo <- ulam(
  alist(
    #likelihood
    B ~ multi_normal( mu , SIGMA ),

    # DGP
    mu <- a + bM*M + bG*G,

    # Autocorr
    matrix[N_spp,N_spp]: SIGMA <- Imat * sigma_sq,

    #priors
    a ~ normal( 0 , 1 ),
    c(bM,bG) ~ normal( 0 , 0.5 ),
    sigma_sq ~ exponential( 1 )
  ), data=dat_list , chains=4 , cores=4,
  log_lik = TRUE)

Yields

Compiling Stan program...
Semantic error in '/var/folders/qs/k7z4lcnx7rdg5hbhp2jr7rs00000gn/T/Rtmpr1OeQW/model-f57a72635ad9.stan', line 35, column 4 to column 11:
   -------------------------------------------------
    33:          mu[i] = a + bM * M[i] + bG * G[i];
    34:      }
    35:      log_lik = multi_normal_lpdf( B | mu , SIGMA );
             ^
    36:  }
    37:  
   -------------------------------------------------

Ill-typed arguments supplied to assignment operator =:
The left hand side has type
  vector
and the right hand side has type
  real

make: *** [/var/folders/qs/k7z4lcnx7rdg5hbhp2jr7rs00000gn/T/Rtmpr1OeQW/model-f57a72635ad9.hpp] Error 1

Error: An error occured during compilation! See the message above for more information.

And, correct me if I'm wrong, but, we need log_lik = TRUE. Otherwise, we get issues like

# from model fit where log_lik = FALSE
WAIC(no_phylo)

Error in rep(0, n_obs) : invalid 'times' argument

rmcelreath commented 5 months ago

This is tricky, because WAIC will need the multi_normal likelihood decomposed, not summed up as Stan does it. So the code in the gq will need to be completely different than the code in the model block. I don't believe there is a way to get multi_normal_lpdf to return a vector, which is what we need.

jebyrnes commented 5 months ago

I'm always here to find fun bugs for you!

Alternately, if lppd could calculate the Log Likelihood matrix directly from a model (extract data and likelihood function) that could solve it.... but, I can see from playing with that, it's a PITA. But, a thought.

jebyrnes commented 5 months ago

Ah, yes, I see even using sim() on the above with log_lik = FALSE fails. Interesting.

Error in Imat * sigma_sq : non-conformable arrays
jebyrnes commented 5 months ago

BTW: you get a much more workable model if you move where you insert the covariance matrix and make it more similar to the GP models earlier in the chapter. Then again, I don't love using variance again - but, the coefs are the same otherwise.

AND now you can get simulated values and compare models. Haven't gone much further, but, perhaps worth looking at?

dat_list <- list(
  N_spp = nrow(dstan),
  sp_id = 1:nrow(dstan),
  M = standardize(log(dstan$body)),
  B = standardize(log(dstan$brain)),
  G = standardize(log(dstan$group_size)),
  Imat = diag(nrow(dstan)) )

no_phylo_withlik <- ulam(
  alist(
    #likelihood
    B ~ dnorm( mu , sigma ),

    # DGP
    mu <- a[sp_id] + bM*M + bG*G,

    # Autocorr
    vector[N_spp]:a ~ multi_normal(0, K),
    matrix[N_spp,N_spp]: K <- Imat * sigma_sq,

    #priors
    a ~ normal( 0 , 1 ),
    c(bM,bG) ~ normal( 0 , 0.5 ),
    sigma_sq ~ exponential( 1 ),
    sigma ~ exponential (1)
  ), data=dat_list , chains=4 , cores=4, log_lik = TRUE)
rmcelreath commented 5 months ago

Yeah that form is great for computation and extension to GLMs. But less transparent for teaching, because it has an extra distribution. Tradeoffs.

jebyrnes commented 5 months ago

I mean, yes and no. It's the same form as the GP stuff above (i.e., like a random intercept), so, if anything, I think there is some good scaffolding there. Just have to explain the extra distribution. Potato Potahto. I'll try it with my students today and see if it works!