stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
387 stars 132 forks source link

`rstanarm::loo` and `rstanarm::posterior_predict` return errors #309

Open alexWhitworth opened 6 years ago

alexWhitworth commented 6 years ago

Summary:

Problems with rstanarm:::loo.stanreg / rstanarm::loo from stan_betareg model fit

Description / reproducibility

> fit1 <- fit1 <- rstanarm::stan_betareg(
  formula= <formula>, 
  data= train, link = "logit", link.phi = "log", 
  prior= normal(), prior_intercept= normal(), prior_phi= exponential(),
  iter= iter, warmup= warmup, chains= chains, cores= chains)

> class(fit1)
[1] "stanreg" "betareg"
> loo(fit1, cores= 20)
Error in names(object[[n]])[1L:2L] <- c("mu", "phi") : 
  attempt to set an attribute on NULL
> rstanarm::loo(fit1, cores= 20)
Error in names(object[[n]]) <- `*vtmp*` : 
  attempt to set an attribute on NULL
> rstanarm:::loo.stanreg(fit1, cores= 20)
Error in names(object[[n]]) <- `*vtmp*` : 
  attempt to set an attribute on NULL
> loo(fit1)
Error in names(object[[n]]) <- `*vtmp*` : 
  attempt to set an attribute on NULL
> tt <- rstanarm::posterior_predict(fit1, draws= 1000, re.form= NULL, allow.new.levels= TRUE)
Error in names(object[[n]]) <- `*vtmp*` : 
  attempt to set an attribute on NULL

using example data:

see: https://cran.r-project.org/web/packages/rstanarm/vignettes/betareg.html

SEED <- 1234
set.seed(SEED)
eta <- c(1, -0.2)
gamma <- c(1.8, 0.4)
N <- 200
x <- rnorm(N, 2, 2)
z <- rnorm(N, 0, 2)
mu <- binomial(link = logit)$linkinv(eta[1] + eta[2]*x)
phi <- binomial(link = log)$linkinv(gamma[1] + gamma[2]*z)
y <- rbeta(N, mu * phi, (1 - mu) * phi)
dat <- data.frame(cbind(y, x, z))

fit1 <- stan_betareg(y ~ x | z, data = dat, link = "logit", link.phi = "log",
                     chains = 1L, cores = 1L, seed = 1L, iter = 100L)
loo(fit1)
Error in names(object[[n]]) <- `*vtmp*` : 
  attempt to set an attribute on NULL
tt <- rstanarm::posterior_predict(fit1, draws= 1000, re.form= NULL, allow.new.levels= TRUE)
Error in names(object[[n]]) <- `*vtmp*` : 
  attempt to set an attribute on NULL

session info

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/local/lib/libtatlas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] loo_1.1.0         rstanarm_2.17.2   Rcpp_0.12.13      betareg_3.2-0     fbmodels_0.1.3    ggplot2_2.2.1    
[7] data.table_1.11.0

loaded via a namespace (and not attached):
 [1] lattice_0.20-35    zoo_1.8-3          gtools_3.5.0       lmtest_0.9-36      assertthat_0.2.0   digest_0.6.12     
 [7] mime_0.5           R6_2.2.2           plyr_1.8.4         stats4_3.4.1       colourpicker_1.0   rlang_0.1.2       
[13] lazyeval_0.2.1     minqa_1.2.4        miniUI_0.1.1       nloptr_1.0.4       Matrix_1.2-10      DT_0.2            
[19] shinythemes_1.1.1  splines_3.4.1      shinyjs_1.0        lme4_1.1-14        stringr_1.2.0      htmlwidgets_0.9   
[25] igraph_1.1.2       munsell_0.4.3      shiny_1.0.5        compiler_3.4.1     httpuv_1.3.5       rstan_2.17.3      
[31] pkgconfig_2.0.1    base64enc_0.1-3    rstantools_1.4.0   htmltools_0.3.6    nnet_7.3-12        tibble_1.3.4      
[37] gridExtra_2.3      codetools_0.2-15   threejs_0.3.1      matrixStats_0.52.2 dplyr_0.7.4        MASS_7.3-47       
[43] grid_3.4.1         nlme_3.1-131       xtable_1.8-2       gtable_0.2.0       magrittr_1.5       StanHeaders_2.17.2
[49] scales_0.5.0       stringi_1.1.6      reshape2_1.4.2     flexmix_2.3-14     bindrcpp_0.2       dygraphs_1.1.1.4  
[55] xts_0.10-0         sandwich_2.4-0     Formula_1.2-3      tools_3.4.1        glue_1.2.0         shinystan_2.4.0   
[61] markdown_0.8       crosstalk_1.0.0    rsconnect_0.8.5    parallel_3.4.1     survival_2.41-3    yaml_2.1.19       
[67] inline_0.3.14      colorspace_1.3-2   bayesplot_1.4.0    bindr_0.1          modeltools_0.2-22 
avehtari commented 6 years ago

The example you posted works for me. I have

R version 3.5.1 (2018-07-02)
rstan_2.17.3
loo_2.0.0

and the biggest difference is that you are using loo_1.1.0. Try using loo_2.0.0 instead.

alexWhitworth commented 6 years ago

@avehtari Thanks--still not working. Different error. Note: I'm unable to update R given distribution via work account.

Same error (below) for both real data and simulated data.

> library(loo)
This is loo version 2.0.0.
> library(rstanarm)
Loading required package: Rcpp
rstanarm (Version 2.17.2, packaged: 2017-12-20 23:59:28 UTC)
> library(data.table)
data.table 1.11.0
> loo_f1 <- rstanarm::loo(fit1, cores= 20)
Error: is.data.frame(data) \|\| is.matrix(data) is not TRUE  
avehtari commented 6 years ago

I tink it's unlikely that R version would matter. Changing loo version did produce different error. The error hints that you have a "strange" data object (we know certain rae matrix/data.frame constructs can cause problems). Can you confirm that you get this error also with that betareg vignette code starting from restarted R and empty workspace?

alexWhitworth commented 6 years ago

@avehtari Correct. From fresh session:

> library(rstanarm)
> library(loo)
... vignette code ...
> loo(fit1)
Error: is.data.frame(data) || is.matrix(data) is not TRUE
> rstanarm::posterior_predict(fit1, draws= 1000, re.form= NULL, allow.new.levels= TRUE)
Error in names(object[[n]]) <- `*vtmp*` : 
  attempt to set an attribute on NULL

attached packages:
[1] loo_2.0.0       rstanarm_2.17.2 Rcpp_0.12.13 
avehtari commented 6 years ago

rstanarm_2.17.2

can you try rstanarm_2.17.4?

alexWhitworth commented 6 years ago

@avehtari Updated to 2.17.4. No change in outcome / error messages:

> loo(fit1)
Error in names(object[[n]]) <- `*vtmp*` : 
  attempt to set an attribute on NULL
> posterior_predict(fit1)
Error in names(object[[n]]) <- `*vtmp*` : 
  attempt to set an attribute on NULL
avehtari commented 6 years ago

No change in outcome / error messages:

But you reported First round: Error in names(object[[n]]) <-vtmp: Second round: Error: is.data.frame(data) || is.matrix(data) is not TRUE Third round: `Error in names(object[[n]]) <-vtmp:

I think there is change in error messages. However, this didn't give me any additional clue what is wrong. Can you try with clean R install in some other machine? We have seen before problems with corrupted R installations, so that no-one else could get the error and only re-installing all R related fixed the problem...

alexWhitworth commented 6 years ago

@avehtari

The new machine (non-clean install) returns functional results. The biggest differences here are (1): running on OS-X 10.13.4 vs Linux Centos 7 and (2) R version 3.5.0... See sessionInfo below. I'll ping our R and STAN teams at work to take a look at this. Any help from your side would also be appreciated, thanks!

library(rstanarm)
library(loo)
sessionInfo()
SEED <- 1234
set.seed(SEED)
eta <- c(1, -0.2)
gamma <- c(1.8, 0.4)
N <- 200
x <- rnorm(N, 2, 2)
z <- rnorm(N, 0, 2)
mu <- binomial(link = logit)$linkinv(eta[1] + eta[2]*x)
phi <- binomial(link = log)$linkinv(gamma[1] + gamma[2]*z)
y <- rbeta(N, mu * phi, (1 - mu) * phi)
dat <- data.frame(cbind(y, x, z))

fit1 <- stan_betareg(y ~ x | z, data = dat, link = "logit", link.phi = "log",
                     chains = 1L, cores = 1L, seed = 1L, iter = 100L)
test1 <- loo(fit1)
test2 <- rstanarm::posterior_predict(fit1, draws= 50, re.form= NULL, allow.new.levels= TRUE)
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] loo_2.0.0       rstanarm_2.17.4 Rcpp_0.12.18   

loaded via a namespace (and not attached):
 [1] lattice_0.20-35    zoo_1.8-1          gtools_3.5.0       assertthat_0.2.0   digest_0.6.15     
 [6] mime_0.5           R6_2.2.2           plyr_1.8.4         ggridges_0.5.0     stats4_3.5.0      
[11] ggplot2_3.0.0      colourpicker_1.0   pillar_1.2.2       rlang_0.2.1        lazyeval_0.2.1    
[16] minqa_1.2.4        miniUI_0.1.1.1     nloptr_1.0.4       Matrix_1.2-14      DT_0.4            
[21] shinythemes_1.1.1  splines_3.5.0      shinyjs_1.0        lme4_1.1-17        stringr_1.3.1     
[26] htmlwidgets_1.2    igraph_1.2.1       munsell_0.4.3      shiny_1.1.0        compiler_3.5.0    
[31] httpuv_1.4.3       rstan_2.17.3       pkgconfig_2.0.1    base64enc_0.1-3    rstantools_1.5.0  
[36] htmltools_0.3.6    tidyselect_0.2.4   tibble_1.4.2       gridExtra_2.3      codetools_0.2-15  
[41] matrixStats_0.54.0 threejs_0.3.1      dplyr_0.7.5        later_0.7.2        MASS_7.3-49       
[46] grid_3.5.0         nlme_3.1-137       xtable_1.8-2       gtable_0.2.0       magrittr_1.5      
[51] StanHeaders_2.17.2 scales_0.5.0       stringi_1.2.2      reshape2_1.4.3     promises_1.0.1    
[56] bindrcpp_0.2.2     dygraphs_1.1.1.6   xts_0.10-2         tools_3.5.0        glue_1.2.0        
[61] shinystan_2.5.0    markdown_0.8       purrr_0.2.4        crosstalk_1.0.0    survival_2.41-3   
[66] parallel_3.5.0     rsconnect_0.8.8    inline_0.3.15      colorspace_1.3-2   bayesplot_1.5.0   
[71] bindr_0.1.1     

 test1 <- loo(fit1)
Warning message:
Found 2 observation(s) with a pareto_k > 0.7. We recommend calling 'loo' again with argument 'k_threshold = 0.7' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 2 times to compute the ELPDs for the problematic observations directly.
test2 <- rstanarm::posterior_predict(fit1, draws= 50, re.form= NULL, allow.new.levels= TRUE)
alexWhitworth commented 6 years ago

@avehtari I have reinstalled R and rebooted by VM server. I get the same error. This leads me to believe it may have to do with R 3.4.1, although I am having a hard time finding the binary for 3.4.1 on CRAN

https://cran.r-project.org/bin/macosx/old/index-old.html https://cran.r-project.org/bin/macosx/

avehtari commented 6 years ago

this leads me to believe it may have to do with R 3.4.1,

This did work for me for a long time with R 3.4. too, and I just switched to 3.5. about a week ago. I'll try to find a computer with R 3.4.*

rld2 commented 6 years ago

@avehtari, I work with Alex and I'm not having any issues on my machine. He asked me to post my reprex here:

library(rstanarm)
#> Loading required package: Rcpp
#> rstanarm (Version 2.17.3, packaged: 2018-02-17 05:11:16 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - Plotting theme set to bayesplot::theme_default().
library(loo)
#> This is loo version 1.1.0
SEED <- 1234
set.seed(SEED)
eta <- c(1, -0.2)
gamma <- c(1.8, 0.4)
N <- 200
x <- rnorm(N, 2, 2)
z <- rnorm(N, 0, 2)
mu <- binomial(link = logit)$linkinv(eta[1] + eta[2] * x)
phi <- binomial(link = log)$linkinv(gamma[1] + gamma[2] * z)
y <- rbeta(N, mu * phi, (1 - mu) * phi)
dat <- data.frame(cbind(y, x, z))

fit1 <- stan_betareg(y ~ x | z, data = dat, link = "logit", link.phi = "log", 
  chains = 1L, cores = 1L, seed = 1L, iter = 100L)
#> 
#> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
#> 
#> Gradient evaluation took 0 seconds
#> 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
#> Adjust your expectations accordingly!
#> 
#> 
#> WARNING: There aren't enough warmup iterations to fit the
#>          three stages of adaptation as currently configured.
#>          Reducing each adaptation stage to 15%/75%/10% of
#>          the given number of warmup iterations:
#>            init_buffer = 7
#>            adapt_window = 38
#>            term_buffer = 5
#> 
#> Iteration:  1 / 100 [  1%]  (Warmup)
#> Iteration: 10 / 100 [ 10%]  (Warmup)
#> Iteration: 20 / 100 [ 20%]  (Warmup)
#> Iteration: 30 / 100 [ 30%]  (Warmup)
#> Iteration: 40 / 100 [ 40%]  (Warmup)
#> Iteration: 50 / 100 [ 50%]  (Warmup)
#> Iteration: 51 / 100 [ 51%]  (Sampling)
#> Iteration: 60 / 100 [ 60%]  (Sampling)
#> Iteration: 70 / 100 [ 70%]  (Sampling)
#> Iteration: 80 / 100 [ 80%]  (Sampling)
#> Iteration: 90 / 100 [ 90%]  (Sampling)
#> Iteration: 100 / 100 [100%]  (Sampling)
#> 
#>  Elapsed Time: 0.46 seconds (Warm-up)
#>                0.6 seconds (Sampling)
#>                1.06 seconds (Total)
loo(fit1)
#> Warning: Found 5 observation(s) with a pareto_k > 0.7. We recommend calling
#> 'loo' again with argument 'k_threshold = 0.7' in order to calculate the
#> ELPD without the assumption that these observations are negligible. This
#> will refit the model 5 times to compute the ELPDs for the problematic
#> observations directly.
#> Computed from 50 by 200 log-likelihood matrix
#> 
#>          Estimate   SE
#> elpd_loo     87.3  9.7
#> p_loo         3.1  0.4
#> looic      -174.7 19.4
#> 
#> Pareto k diagnostic values:
#>                          Count  Pct 
#> (-Inf, 0.5]   (good)     185   92.5%
#>  (0.5, 0.7]   (ok)        10    5.0%
#>    (0.7, 1]   (bad)        3    1.5%
#>    (1, Inf)   (very bad)   2    1.0%
#> See help('pareto-k-diagnostic') for details.
Session info ``` r devtools::session_info() #> Session info ------------------------------------------------------------- #> setting value #> version R version 3.4.1 (2017-06-30) #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> tz America/Los_Angeles #> date 2018-07-31 #> Packages ----------------------------------------------------------------- #> package * version date source #> assertthat 0.2.0 2017-04-11 CRAN (R 3.4.1) #> backports 1.1.1 2017-09-25 CRAN (R 3.4.1) #> base * 3.4.1 2017-12-05 local #> base64enc 0.1-3 2015-07-28 CRAN (R 3.4.1) #> bayesplot 1.4.0 2017-09-12 CRAN (R 3.4.1) #> betareg 3.1-0 2016-08-06 CRAN (R 3.4.1) #> bindr 0.1 2016-11-13 CRAN (R 3.4.1) #> bindrcpp 0.2 2017-06-17 CRAN (R 3.4.1) #> codetools 0.2-15 2016-10-05 CRAN (R 3.4.1) #> colorspace 1.3-2 2016-12-14 CRAN (R 3.4.1) #> colourpicker 1.0 2017-09-27 CRAN (R 3.4.1) #> compiler 3.4.1 2017-12-05 local #> crosstalk 1.0.0 2016-12-21 CRAN (R 3.4.1) #> datasets * 3.4.1 2017-12-05 local #> devtools 1.13.3 2017-08-02 CRAN (R 3.4.1) #> digest 0.6.15 2018-01-28 cran (@0.6.15) #> dplyr 0.7.4 2017-09-28 CRAN (R 3.4.1) #> DT 0.4 2018-01-30 CRAN (R 3.4.1) #> dygraphs 1.1.1.4 2017-01-04 CRAN (R 3.4.1) #> evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1) #> flexmix 2.3-14 2017-04-28 CRAN (R 3.4.1) #> formatR 1.5 2017-04-25 CRAN (R 3.4.1) #> Formula 1.2-2 2017-07-10 CRAN (R 3.4.1) #> ggplot2 3.0.0 2018-07-03 cran (@3.0.0) #> glue 1.3.0 2018-07-17 cran (@1.3.0) #> graphics * 3.4.1 2017-12-05 local #> grDevices * 3.4.1 2017-12-05 local #> grid 3.4.1 2017-12-05 local #> gridExtra 2.3 2017-09-09 CRAN (R 3.4.1) #> gtable 0.2.0 2016-02-26 CRAN (R 3.4.1) #> gtools 3.5.0 2015-05-29 CRAN (R 3.4.1) #> htmltools 0.3.6 2017-04-28 CRAN (R 3.4.1) #> htmlwidgets 1.0 2018-01-20 CRAN (R 3.4.1) #> httpuv 1.3.5 2017-07-04 CRAN (R 3.4.1) #> igraph 1.1.2 2017-07-21 CRAN (R 3.4.1) #> inline 0.3.14 2015-04-13 CRAN (R 3.4.1) #> knitr 1.20 2018-02-20 CRAN (R 3.4.1) #> lattice 0.20-35 2017-03-25 CRAN (R 3.4.1) #> lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.1) #> lme4 1.1-14 2017-09-27 CRAN (R 3.4.1) #> lmtest 0.9-35 2017-02-11 CRAN (R 3.4.1) #> loo * 1.1.0 2017-03-27 CRAN (R 3.4.1) #> magrittr 1.5 2014-11-22 CRAN (R 3.4.1) #> markdown 0.8 2017-04-20 CRAN (R 3.4.1) #> MASS 7.3-47 2017-02-26 CRAN (R 3.4.1) #> Matrix 1.2-10 2017-05-03 CRAN (R 3.4.1) #> matrixStats 0.53.1 2018-02-11 CRAN (R 3.4.1) #> memoise 1.1.0 2017-04-21 CRAN (R 3.4.1) #> methods * 3.4.1 2017-12-05 local #> mime 0.5 2016-07-07 CRAN (R 3.4.1) #> miniUI 0.1.1 2016-01-15 CRAN (R 3.4.1) #> minqa 1.2.4 2014-10-09 CRAN (R 3.4.1) #> modeltools 0.2-21 2013-09-02 CRAN (R 3.4.1) #> munsell 0.4.3 2016-02-13 CRAN (R 3.4.1) #> nlme 3.1-131 2017-02-06 CRAN (R 3.4.1) #> nloptr 1.0.4 2014-08-04 CRAN (R 3.4.1) #> nnet 7.3-12 2016-02-02 CRAN (R 3.4.1) #> parallel 3.4.1 2017-12-05 local #> pillar 1.2.1 2018-02-27 CRAN (R 3.4.1) #> pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.1) #> plyr 1.8.4 2016-06-08 CRAN (R 3.4.1) #> R6 2.2.2 2017-06-17 CRAN (R 3.4.1) #> Rcpp * 0.12.17 2018-05-18 cran (@0.12.17) #> reshape2 1.4.3 2017-12-11 cran (@1.4.3) #> rlang 0.2.1 2018-05-30 cran (@0.2.1) #> rmarkdown 1.9 2018-03-01 CRAN (R 3.4.1) #> rprojroot 1.2 2017-01-16 CRAN (R 3.4.1) #> rsconnect 0.8.8 2018-03-09 CRAN (R 3.4.1) #> rstan 2.17.3 2018-01-20 CRAN (R 3.4.1) #> rstanarm * 2.17.3 2018-02-18 CRAN (R 3.4.1) #> rstantools 1.4.0 2017-12-21 CRAN (R 3.4.1) #> sandwich 2.4-0 2017-07-26 CRAN (R 3.4.1) #> scales 0.5.0 2017-08-24 CRAN (R 3.4.1) #> shiny 1.0.5 2017-08-23 CRAN (R 3.4.1) #> shinyjs 1.0 2018-01-08 CRAN (R 3.4.1) #> shinystan 2.4.0 2017-08-02 CRAN (R 3.4.1) #> shinythemes 1.1.1 2016-10-12 CRAN (R 3.4.1) #> splines 3.4.1 2017-12-05 local #> StanHeaders 2.17.2 2018-01-20 CRAN (R 3.4.1) #> stats * 3.4.1 2017-12-05 local #> stats4 3.4.1 2017-12-05 local #> stringi 1.2.4 2018-07-20 cran (@1.2.4) #> stringr 1.3.1 2018-05-10 cran (@1.3.1) #> survival 2.41-3 2017-04-04 CRAN (R 3.4.1) #> threejs 0.3.1 2017-08-13 CRAN (R 3.4.1) #> tibble 1.4.2 2018-01-22 CRAN (R 3.4.1) #> tools 3.4.1 2017-12-05 local #> utils * 3.4.1 2017-12-05 local #> withr 2.1.2 2018-03-15 cran (@2.1.2) #> xtable 1.8-2 2016-02-05 CRAN (R 3.4.1) #> xts 0.10-0 2017-07-07 CRAN (R 3.4.1) #> yaml 2.1.14 2016-11-12 CRAN (R 3.4.1) #> zoo 1.8-0 2017-04-12 CRAN (R 3.4.1) ```
alexWhitworth commented 6 years ago

@avehtari given Ryan's results above, I have uninstalled / reinstalled R from my computer.

This has fixed the error on posterior_predict and rstanarm:::loo.stanreg for my real data. But, interestingly, I still get the same errors (below) with the demo data from the vignette. I also get this error with loo (without the call to the non-exported function).

loo(fit1) Error in names(object[[n]]) <- *vtmp* : attempt to set an attribute on NULL posterior_predict(fit1) Error in names(object[[n]]) <- *vtmp* : attempt to set an attribute on NULL

Note that we are also soon to migrate our VMs at work from R 3.4.1 to 3.5.1. I can provide an update when that occurs as well