rmcelreath / rethinking

Statistical Rethinking course and book package
2.15k stars 604 forks source link

Subtle difference in array, matrix object attributes(?) breaks Stan model compilation #312

Open alexander-pastukhov opened 3 years ago

alexander-pastukhov commented 3 years ago

R 4.0.5, rethinking 2.13, cmdstanr 0.3.0.

How to replicate the problem (see also code below):

  1. Create a new column via scale() function,
  2. Slice the data.frame (e.g., d[1:9, ] or d[-10, ])
  3. Use the column from sliced data.frame for ulam() data list.

Compilation works if step #2 is skipped. The only difference I've found so far, is that slicing removes scaled:center and scaled:scale, and hidden cdr attributes (the latter is set to NULL).

UPDATE / WORKAROUND: converting output of scale() to an atomic vector, e.g., using c(scale(...)) on step #1 makes everything work.

If you do step #2, Stan throws an error:

Dimension mismatch in assignment; variable name = lambda, type = real; right-hand side type = row_vector. Illegal statement beginning with non-void expression parsed as lambda[i] Not a legal assignment, sampling, or function statement. Note that

  • Assignment statements only allow variables (with optional indexes) on the left;
  • Sampling statements allow arbitrary value-denoting expressions on the left.
  • Functions used as statements must be declared to have void returns

    error in 'model32b06dcd14dd_138676117686992e3c239cfd520ae112' at line 15, column 8

    13: a ~ normal( 3 , 0.5 ); 14: for ( i in 1:9 ) { 15: lambda[i] = a[cid[i]] + b[cid[i]] * P[i]; ^ 16: lambda[i] = exp(lambda[i]);

The code that works as in the book

These are code chunks 11.36, 11.37, and 11.45

library(rethink
data(Kline)
d1 <-Kline
d1$P <- scale(log(d1$population))
d1$contact_id <- ifelse(d1$contact=="high",2,1)
dat1 <- list(
  T = d1$total_tools,
  P = d1$P,
  cid = d1$contact_id) 

m11.10_1 <- ulam(
  alist(
    T ~ dpois(lambda),
    log(lambda) <- a[cid] + b[cid] * P,
    a[cid] ~ dnorm(3,0.5),
    b[cid] ~ dnorm(0,0.2)),
  data=dat1,
  chains=4, cores=4,
  log_lik=TRUE)

The code that breaks the compilation

d2 <- d1[d1$culture != "Hawaii", ]
dat2 <- list(
  T = d2$total_tools,
  P = d2$P,
  cid = d2$contact_id)

m11.10_2 <- ulam(
  alist(
    T ~ dpois(lambda),
    log(lambda) <- a[cid] + b[cid] * P,
    a[cid] ~ dnorm(3,0.5),
    b[cid] ~ dnorm(0,0.2)),
  data=dat2,
  chains=4, cores=4,
  log_lik=TRUE)
schwa021 commented 3 years ago

I just came over here to post a ?related? bug and a fix I found - and lo-and-behold, your issue was at the top of the list.

I was having trouble with a similar error message when I only sent in a single data point to a model that was written to accomodate multiple data points. It seems that there is something funny about how R handles vectors/arrays/etc..

I found the following fix on the Stan page, and it worked for me: https://github.com/stan-dev/rstan/issues/56

Perhaps enclosing your command in an "as.array(c(.))" will work? It forces the data type to something Stan seems to understand better.

alexander-pastukhov commented 3 years ago

Perhaps enclosing your command in an "as.array(c(.))" will work? It forces the data type to something Stan seems to understand better.

Yes it does. Sorry, I should have mentioned the workaround when posting, my bad! You do not even need the as.array() just c() to convert it to an atomic vector.