stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
389 stars 133 forks source link

summary() fails for models with fullrank-algorithm #385

Closed strengejacke closed 5 years ago

strengejacke commented 5 years ago

When calling summary() on a model fitted with algorithm = "fullrank" gives an error when displaying the Diagnistic, probably because the column is named khat, not Rhat.

library(rstanarm)
m <- stan_glm(Sepal.Length ~ Petal.Width, data = iris, algorithm = "fullrank")
summary(m)
#> 
#> Model Info:
#> 
#>  function:     stan_glm
#>  family:       gaussian [identity]
#>  formula:      Sepal.Length ~ Petal.Width
#>  algorithm:    fullrank
#>  priors:       see help('prior_summary')
#>  observations: 150
#>  predictors:   2
#> 
#> Estimates:
#>               mean   sd   2.5%   25%   50%   75%   97.5%   khat
#> (Intercept) 4.8    0.1  4.6    4.7   4.8   4.8   4.9     0.1   
#> Petal.Width 0.9    0.1  0.8    0.9   0.9   0.9   1.0     0.1   
#> sigma       0.5    0.0  0.4    0.5   0.5   0.5   0.5     0.1   
#> mean_PPD    5.8    0.1  5.7    5.8   5.8   5.9   5.9     0.1   
#> 
#> Diagnostics:
#> Error in x[, c("mcse", "Rhat"), drop = FALSE]: subscript out of bounds

Created on 2019-09-17 by the reprex package (v0.3.0)

strengejacke commented 5 years ago

same for algorithm = "meanfield".

jgabry commented 5 years ago

This is actually already fixed on the master branch so I'm going to close this, but thanks for reporting it. We'll be releasing a new version this week. Here's what summary for fullrank/meanfield will look like (there are some new diagnostics printed for VB models in the summary)

library(rstanarm)
m <- stan_glm(Sepal.Length ~ Petal.Width, data = iris, algorithm = "fullrank", refresh = 0)

Warning: Pareto k diagnostic value is 0.72. Resampling is unreliable. Increasing the number of draws or decreasing tol_rel_obj may help.

Setting 'QR' to TRUE can often be helpful when using one of the variational inference algorithms. See the documentation for the 'QR' argument.
summary(m)

Model Info:
 function:     stan_glm
 family:       gaussian [identity]
 formula:      Sepal.Length ~ Petal.Width
 algorithm:    fullrank
 priors:       see help('prior_summary')
 observations: 150
 predictors:   2

Estimates:
              mean   sd   10%   50%   90%
(Intercept) 4.8    0.1  4.7   4.8   4.8  
Petal.Width 0.9    0.0  0.9   0.9   1.0  
sigma       0.5    0.0  0.4   0.5   0.5  

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 5.8    0.1  5.8   5.8   5.9  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

Monte Carlo diagnostics
            mcse khat n_eff
(Intercept) 0.0  0.7  314  
Petal.Width 0.0  0.7  112  
sigma       0.0  0.7   68  
mean_PPD    0.0  0.7  143  

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and khat is the Pareto k diagnostic for importance sampling (perfomance is usually good when khat < 0.7).
strengejacke commented 5 years ago

Great, thanks!