stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 26 forks source link

variable selection for uncorrelated random variables #435

Open wlangera opened 1 year ago

wlangera commented 1 year ago

Hi

I want to fit a Poisson model with uncorrelated random variables and perform projection prediction. Is it true that this is not yet possible with projpred or can I specify my model in a different way such that I can use uncorrelated random variables? I also wonder if it matters for projection prediction whether they are specified as correlated or not.

Unexpected number of | characters in group terms. Please contact the package maintainer.

library(brms)
#> Loading required package: Rcpp
#> Warning: package 'Rcpp' was built under R version 4.3.1
#> Loading 'brms' package (version 2.19.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(projpred)
#> Warning: package 'projpred' was built under R version 4.3.1
#> This is projpred version 2.6.0.
#> 
#> Attaching package: 'projpred'
#> The following object is masked from 'package:brms':
#> 
#>     do_call

# simulate data
species <- paste("species", 1:5, sep = "_")
periods <- paste("period", 1:2, sep = "_")
df <- expand.grid(species, periods)

set.seed(123)
lambdas <- c(rgamma(length(species), 0.7, 0.2),
             rgamma(length(species), 0.5, 0.25))

data_list <- lapply(seq_along(df[, 1]), function(i) {
  out <- tidyr::expand_grid(df[i, ], rpois(30, lambdas[i]))
  names(out) <- c("species", "period", "count")
  return(out)
  })
dat_df <- do.call(rbind.data.frame, data_list)

# MCMC parameters
nchains <- 3 # number of chains
niter <- 4000 # number of iterations (incl. burn-in)
burnin <- niter / 4 # number of initial samples to discard (burn-in)
nparallel <- nchains # number of cores used for parallel computing

## Uncorrelated random random variables
# Fit model
fit1 <- brm(bf(count ~ period + (1 + period || species)),
            data = dat_df,
            family = poisson(),
            chains = nchains, warmup = burnin, iter = niter, cores = nparallel,
            control = list(adapt_delta = 0.99,
                           max_treedepth = 12))
#> Compiling Stan program...
#> Start sampling

# Projection prediction
cvvs_fast1 <- cv_varsel(fit1,
                        method = "forward",
                        nclusters_pred = 20, # speed
                        cv_method = "LOO",
                        # only for the sake of speed
                        validate_search = FALSE)
#> Error in FUN(X[[i]], ...): Unexpected number of `|` characters in group terms. Please contact the package maintainer.

Created on 2023-07-24 with reprex v2.0.2

Kind regards, Ward

fweber144 commented 1 year ago

Hi Ward,

Thank you for this issue.

According to your brms code, you have uncorrelated group-level intercepts and period slopes for species (because of the || syntax; see the brms documentation and Table 2 in the lme4 vignette Fitting Linear Mixed-Effects Models Using lme4; beware that "uncorrelated" refers to zero correlation of the group-level intercept (regarded as a random variable) and the group-level period slope (regarded as a random variable), so to zero correlation in the corresponding bivariate normal distribution).

You are right that this is not supported by projpred yet. Sorry :/

I'll leave this issue open because (i) we should throw a more informative error message in that case and (ii) as a reminder that this could be implemented in the future.

fweber144 commented 1 year ago

I forgot to answer two more specific questions:

can I specify my model in a different way such that I can use uncorrelated random variables?

Currently, I don't think this is possible.

I also wonder if it matters for projection prediction whether they are specified as correlated or not.

Yes, it does. But the reason why I added some more explanations concerning the || syntax above is that given these explanations, you might perhaps decide that you don't need the || syntax and instead rely on the ordinary | syntax.

wlangera commented 1 year ago

Thank you for your reply and extra information!