stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
385 stars 132 forks source link

regex_pars: allow lookaround in regular expressions #547

Open JohannesNE opened 3 years ago

JohannesNE commented 3 years ago

Summary:

When sampling a parameter (e.g. with as.matrix()) it would be useful to sample all parameters except those that match a regular expression. Negative lookahead is usefull for this.

Description:

To allow lookarounds in regular expression when selecting parameters, I believe it would be sufficient to allow the user to set perl = TRUE in the grep call in grep_for_pars() https://github.com/stan-dev/rstanarm/blob/c09678d4bfaf8268531c6737bacfcf10e79c5698/R/misc.R#L433

I am not sure what an appropriate implementation would look like. Maybe as.matrix.stanreg could have a list-parameter that is passed to grep in grep_for_pars()

E.g: as.matrix(example_model, regex_pars = "^((?!herd).)*$", grep_args = list(perl = TRUE))

Reproducible Steps:

library(rstanarm)
#> Loading required package: Rcpp
#> This is rstanarm version 2.21.1
#> - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
#> - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#>   options(mc.cores = parallel::detectCores())
example("example_model")
#> 
#> exmpl_> example_model <- 
#> exmpl_+   stan_glmer(cbind(incidence, size - incidence) ~ size + period + (1|herd),
#> exmpl_+              data = lme4::cbpp, family = binomial, QR = TRUE,
#> exmpl_+              # this next line is only to keep the example small in size!
#> exmpl_+              chains = 2, cores = 1, seed = 12345, iter = 1000, refresh = 0)
#> 
#> exmpl_> example_model
#> stan_glmer
#>  family:       binomial [logit]
#>  formula:      cbind(incidence, size - incidence) ~ size + period + (1 | herd)
#>  observations: 56
#> ------
#>             Median MAD_SD
#> (Intercept) -1.5    0.6  
#> size         0.0    0.0  
#> period2     -1.0    0.3  
#> period3     -1.1    0.3  
#> period4     -1.6    0.4  
#> 
#> Error terms:
#>  Groups Name        Std.Dev.
#>  herd   (Intercept) 0.8     
#> Num. levels: herd 15 
#> 
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg

grep("^((?!herd).)*$", rownames(example_model$stan_summary), perl = TRUE, value = TRUE)
#> [1] "(Intercept)"   "size"          "period2"       "period3"      
#> [5] "period4"       "mean_PPD"      "log-posterior"

as.matrix(example_model, regex_pars = "^((?!herd).)*$")
#> Warning in grep(regex_pars[j], rownames(x$stan_summary), value = TRUE): TRE
#> pattern compilation error 'Invalid regexp'
#> Error in grep(regex_pars[j], rownames(x$stan_summary), value = TRUE): invalid regular expression '^((?!herd).)*$', reason 'Invalid regexp'

Created on 2021-09-02 by the reprex package (v2.0.0)

RStanARM Version:

2.21.1

R Version:

4.1.0

Operating System:

Ubuntu 21.04