I'm not entirely sure whether this is a bug or a feature, given the scope of glimmer, but I noticed that glimmer fails cryptically when data contains any NA values.
library(rethinking)
data(UCBadmit)
#introduce missingness
UCBadmit$applications[3:5] <- NA
# varying intercepts
f3 <- cbind(admit,reject) ~ (1|dept) + applicant.gender + applications
m3 <- glimmer( f3 , UCBadmit , binomial )
# fails with
# Error in `[[<-.data.frame`(`*tmp*`, pf$yname, value = c(512L, 89L, 353L, :
# replacement has 12 rows, data has 11
model.frame (which is implicitly called by passing data to model.matrix) by default resorts to na.omit, thereby truncating the data.frame, which in turn triggers the error message quoted in the above example when glimmer is building pf$dat (https://github.com/rmcelreath/rethinking/blob/master/R/glimmer.R#L243)
The quick solution to this is to change the global options:
which will create the map2stan code and then only trigger an error downstream when executing map2stan
m3 <- map2stan(m3$f, data=m3$d )
#Error in map2stan(m3$f, data = m3$d) :
# Variable 'applications' has missing values (NA) in:
#Intercept + b_applicant_gendermale * applicant_gendermale[i] + b_applications * applications + #v_Intercept[dept]
#Either remove cases with NA or declare a distribution to use for imputation.
unless the model code returned from glimmer is amended to handle the imputation, e.g. (excuse the non-sensical imputation prior)
I appreciate that glimmer is not intended to offer thought-free model conversion, but it would be helpful to either (a) mention the default na.action behaviour in the documentation, (b) have glimmer fail with an informative message (like map2stan does with NAs), or (c) allow to pass through an na.action argument to model.frame.
I'm happy to draft a pull request for either option if you'd let me know which one you'd prefer.
Thanks for the great book & package!
Click to expand sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] rethinking_1.59 rstan_2.15.1 StanHeaders_2.15.0-1 ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 codetools_0.2-14 matrixStats_0.52.2 mvtnorm_1.0-6 lattice_0.20-33 assertthat_0.1
[7] MASS_7.3-44 grid_3.3.2 plyr_1.8.4 gtable_0.2.0 stats4_3.3.2 coda_0.19-1
[13] scales_0.4.1 lazyeval_0.2.0 tools_3.3.2 loo_1.1.0 munsell_0.4.3 inline_0.3.14
[19] colorspace_1.3-2 gridExtra_2.2.1 tibble_1.2
I'm not entirely sure whether this is a bug or a feature, given the scope of
glimmer
, but I noticed thatglimmer
fails cryptically whendata
contains any NA values.I've debugged this, and traced it to the default NA handling in the calls to
model.matrix
andmodel.frame
inxparse_glimmer_formula
(https://github.com/rmcelreath/rethinking/blob/master/R/glimmer.R#L60 and https://github.com/rmcelreath/rethinking/blob/master/R/glimmer.R#L64)model.frame
(which is implicitly called by passingdata
tomodel.matrix
) by default resorts tona.omit
, thereby truncating the data.frame, which in turn triggers the error message quoted in the above example whenglimmer
is buildingpf$dat
(https://github.com/rmcelreath/rethinking/blob/master/R/glimmer.R#L243)The quick solution to this is to change the global options:
which will create the
map2stan
code and then only trigger an error downstream when executingmap2stan
unless the model code returned from
glimmer
is amended to handle the imputation, e.g. (excuse the non-sensical imputation prior)I appreciate that
glimmer
is not intended to offer thought-free model conversion, but it would be helpful to either (a) mention the defaultna.action
behaviour in the documentation, (b) haveglimmer
fail with an informative message (like map2stan does with NAs), or (c) allow to pass through anna.action
argument tomodel.frame
.I'm happy to draft a pull request for either option if you'd let me know which one you'd prefer.
Thanks for the great book & package!
Click to expand sessionInfo()
R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rethinking_1.59 rstan_2.15.1 StanHeaders_2.15.0-1 ggplot2_2.2.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.10 codetools_0.2-14 matrixStats_0.52.2 mvtnorm_1.0-6 lattice_0.20-33 assertthat_0.1 [7] MASS_7.3-44 grid_3.3.2 plyr_1.8.4 gtable_0.2.0 stats4_3.3.2 coda_0.19-1 [13] scales_0.4.1 lazyeval_0.2.0 tools_3.3.2 loo_1.1.0 munsell_0.4.3 inline_0.3.14 [19] colorspace_1.3-2 gridExtra_2.2.1 tibble_1.2