sambrilleman / rstanarm

rstanarm R package for Bayesian applied regression modeling
http://mc-stan.org/interfaces/rstanarm.html
GNU General Public License v3.0
0 stars 1 forks source link

Minor error when `dataLong` is a `tbl_df` instead of data.frame #76

Closed jburos closed 7 years ago

jburos commented 7 years ago

Summary:

Running into a somewhat cryptic error when fitting a model where dataLong is a tbl_df rather than a data.frame.

Error: Invalid grouping factor specification, id

Description:

Tracked the problem down to this line in handle_assocmod:

  rows <- rownames(model.frame(y_mod_stuff$mod))
  df   <- dataLong[rows,]  # <-- this line
  mf   <- data.table::data.table(df, key = c(id_var, time_var))

This method of indexing into rows will not work correctly if dataLong has class tbl_df. It will then return a complete dataframe with NA values, leading to lmer::glFormula complaining that the factor spec is invalid (because it's operating on an empty dataframe).

Reproducible Steps:

Code:

data(pbcSurv)
data(pbcLong)
pbcLong <- tbl_df(pbcLong)
f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id),
                               dataLong = pbcLong,
                               formulaEvent = Surv(futimeYears, death) ~ sex + trt,
                               dataEvent = pbcSurv,
                               time_var = "year",
                               basehaz = 'bs')

Output:

> data(pbcSurv)
> data(pbcLong)
> # works
> f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id),
+                               dataLong = pbcLong,
+                               formulaEvent = Surv(futimeYears, death) ~ sex + trt,
+                               dataEvent = pbcSurv,
+                               time_var = "year",
+                               basehaz = 'bs')
> # does not work
> pbcLong <- tbl_df(pbcLong)
> f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id),
+                               dataLong = pbcLong,
+                               formulaEvent = Surv(futimeYears, death) ~ sex + trt,
+                               dataEvent = pbcSurv,
+                               time_var = "year",
+                               basehaz = 'bs')
Error: Invalid grouping factor specification, id

RStanARM Version:

Current master branch of sambrilleman/rstanarm (commit 8404e2ba86c186c212fe2a57ca8a36fd263a80ae)

R Version:

3.3.3

Operating System:

Linux / ubuntu

sambrilleman commented 7 years ago

Pull request #77 has hopefully resolved this issue, so I will close it. (But do let me know if a similar issue appears elsewhere!).

sambrilleman commented 7 years ago

Also referencing this commit