rmcelreath / rethinking

Statistical Rethinking course and book package
2.16k stars 605 forks source link

Wrong (undefined) array size constant in map2stan model with missing values #54

Open drpetersen opened 8 years ago

drpetersen commented 8 years ago

First of all, thank you very much for your extremely well written and insightful book and the accompanying software package, which has already changed my 'statistical life'.

I was trying to use map2stan for solving a multilevel CFA, and in the course of that endeavour, came across a problem which I will try to demonstrate with the following minimal working example (the original CFA model is no longer recognizable from this, of course):

library(rethinking)
## package rethinking_1.59, installed from github

d <- data.frame( ## Never mind, just a small data set …
    x = c(1, 2, 3, 4),
    y = c(3, 4, 1, 2),
    z = c(1, 3, 2, 4),
    grp = c(1L, 1L, 2L, 2L) ## … with a group indicator
)

model <- alist(
    y ~ dnorm(mu_y, sd_y),
    z ~ dnorm(mu_z, sd_z),
    mu_y <- b_y * x + a[grp],
    mu_z <- b_z * x + a[grp], ## Probably the rest is irrelevant, but for completeness …
    a[grp] ~ dnorm(mu_a, sd_a),
    c(b_y, b_z) ~ dnorm(0, 100),
    c(mu_y, mu_z, mu_a) ~ dnorm(0, 100),
    c(sd_y, sd_z, sd_a) ~ dcauchy(0, 1)
)

model0 <- map2stan(model, data = d, sample = FALSE)
stancode(model0)
## (output omitted)

It seems like in the data declaration of the group index, int grp[N_z];, a wrong symbol, N_z, is given for the array size, which probably should be N_grp, declared in the stancode. This is not really a problem unless z contains missing values, because in that case, N_z will no longer be declared in the stancode and that code will not run any more:

d2 <- d
d2$z[1] <- NA

model2 <- map2stan(model, data = d2, sample = FALSE)
stancode(model2)
## data{
##     int<lower=1> N;
##     int<lower=1> N_z_missing;
##     int<lower=1> N_grp;
##     int<lower=1> z_missingness;
##     real y[N];
##     real z[N];
##     real x[N];
##     int grp[N_z];
## }
## (further output omitted)

I tried to find my way around the source code of map2stan, but don't really feel up to fixing that. Thanks for looking into this issue. If I can be of any further help, please let me know.

rmcelreath commented 8 years ago

If I understand right, this is perhaps a symptom of the fact that missing values on outcomes are not supported. I hesitate, because I can imagine instances in which one imputes a predictor (instead of an outcome) and the same issue might arise.

I'll mark this nevertheless as a feature request, because it's not yet a supported structure.

Thanks for the detailed example.

drpetersen commented 8 years ago

Fair enough, that makes sense, thank you.

Following your explanation, I tried to find an example where the same issue arises with missings in predictors only, but actually couldn't come up with one (yet). If I do, I'll report back.