merliseclyde / BAS

BAS R package for Bayesian Model Averaging and Variable Selection
https://merliseclyde.github.io/BAS/
GNU General Public License v3.0
42 stars 16 forks source link

Reducing model space based on the number of models that satisfy hereditary constraints #88

Open merliseclyde opened 1 week ago

merliseclyde commented 1 week ago

Currently algorithms in BAS allocate n.models for storage based on user input or if feasible equal to 2^p for enumeration (BAS, deterministic, MCMC+BAS. This is reduced if variables are forced to always be included include.always = ~ X1 + X2 + X1:X2

For models with factors or orthogonal polynomials, practice is to include higher order terms only if lower order "parents" are included in the model. These constraints are imposed in the sampling algorithms BAS and MCMC (but not the deterministic, MCMC+BAS or AMCMC search mechanisms.

counting models under hierarchical constraints is added to bas.lm in the function n.models = count.heredity.models(mf, n.models) and added a unit test in test-interactions.R. This does not count all models under heredity constraints as that becomes expensive for a large number of factors and higher order of interactions, and instead stops for higher orders if the number exceeds the pre-specified cap on the number of models to sample.

However this does not catch the following cases:

Implementing should permit eliminating the use of SETLENGTH inlm_sampleworep.c and glm_sampleworep.c