merliseclyde / BAS

BAS R package https://merliseclyde.github.io/BAS/
https://merliseclyde.github.io/BAS/
GNU General Public License v3.0
41 stars 16 forks source link

`bas.predict` error when n<p when using `estimator= BMA` and `se.fit =T` #70

Open petersen-f opened 1 year ago

petersen-f commented 1 year ago

Describe the bug Great package! A noticed a bug however when trying to perform predictions. When training a model with more predictors than variables (n<p) via method = BAS, the prediction of new data (with se.fit =T and estimator= BMA) fails with the following error: Error in solve.default(qr.R(qr(oldX))) : 'a' (14 x 15) must be square

To Reproduce Steps to reproduce the behavior:

data("bodyfat")
bas_mod <- bas.lm(Bodyfat ~.,data = bodyfat[1:14,], method = 'BAS')
pred <- predict(bas_mod,newdata = bodyfat[15:20,], se.fit = T, estimator = 'BMA') 

Expected behavior The function should return predictions with the 95% credible interval. If this behavior is not a bug and this type of prediction is impossible I would expect a more informative error that se.fit =T is not supported for n>p scenarios via BMA and the BAS method. It seems to work fine if the method is set to MCMC however.

Desktop (please complete the following information):

merliseclyde commented 1 year ago

Thanks! That is a bug in the n<p case. I am guessing the reason that it does not happen with MCMC as the sampler is not visiting the non-full rank models, while BAS in this case is enumerating them and non-full rank models are part of the model space (but there should have been a warning about that - an additional issue).

merliseclyde commented 7 months ago

error also triggered using method='deterministic' as this also samples all models.

data("bodyfat")
bas_mod <- bas.lm(Bodyfat ~.,data = bodyfat[1:14,], method = 'BAS')
pred <- predict(bas_mod,newdata = bodyfat[15:20,], se.fit = T, estimator = 'BMA') 

This is also y a problem in bas.glm as well.

1) Short-term fix is to remove rank deficient models in predict as an option (post-process).
2) Assign rank-deficient models prior probability 0 in C which would fix this via solution to issue #74 3) Fix code so that models are not saved for sampling methods BAS, deterministic and MCMC+BAS