rmcelreath / rethinking

Statistical Rethinking course and book package
2.14k stars 601 forks source link

Handling NA's from section 14.2.1 #102

Open JohnsonBrent opened 7 years ago

JohnsonBrent commented 7 years ago

I'm trying to fit a model of missing data patterned after the milk example in section 14.2.1 and I get an odd error. Any suggestions? In my extension of this example, I've changed the neocortex variable to be categorical with three categories as per...

d$neocortex.cat <- as.integer(cut(d$neocortex.perc, breaks=3, labels=c(1:3)))

I then update the data list...

data_list <- list(
  kcal = d$kcal.per.g,
  neocortex = d$neocortex.cat,
  logmass = d$logmass )

And in the code below, I modify the model to impute missing neocortex categories as opposed to the real neocortex values in the original example. The code below is a little over-parameterized, but it works if the milk data has no missing values. Once missing values are included, however, I get the following error:

Imputing 12 missing values (NA) in variable 'neocortex'.
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

No matches for: 

  real ~ categorical(vector)

Available argument signatures for categorical:

  int ~ categorical(vector)
  int[] ~ categorical(vector)

require real scalar return type for probability function.

Again, the following code works just fine if I remove the NA neocortex records. The error only comes when NA's are included (which was the demo of 14.2.1). Any suggestions? Or does the rethinking package not handle missing categorical values?

m14.3 <- map2stan(
  alist(
    kcal ~ dnorm(mu,sigma),
    mu <- a + bN[neocortex] + bM*logmass,
    a ~ dnorm(0,100),
    bN[neocortex] ~ dnorm(0,10),
    bM ~ dnorm(0,10),

    neocortex ~ dcategorical(softmax(0,theta2,theta3)),
    theta2 <- beta2,
    theta3 <- beta3,
    beta2 ~ dnorm(0,5),
    beta3 ~ dnorm(0,5),

    sigma ~ dcauchy(0,1)
  ) ,
  data=data_list , iter=1e4 , chains=2 )

Thanks in advance!

rmcelreath commented 7 years ago

Right, it doesn't yet work with discrete variables. The Experimental branch has support for binary discrete variables with missingness (https://github.com/rmcelreath/rethinking/tree/Experimental#semi-automated-marginalization-for-binary-discrete-missing-values). I plan to extend to categorical variables---it should be hard, given the code that makes the binary ones work. But I haven't found the time, so no schedule I can commit to right now.

JohnsonBrent commented 7 years ago

Belated thanks for referring me to the experimental branch. Missingness in binary discrete variables now works like a charm. I'll keep watch should you decide to extend it to the multinomial case. For those looking to install this, it's simply...

devtools::install_github("rmcelreath/rethinking",ref = 'Experimental')