merliseclyde / BAS

BAS R package https://merliseclyde.github.io/BAS/
https://merliseclyde.github.io/BAS/
GNU General Public License v3.0
41 stars 16 forks source link

[Bug] NaN posterior inclusion probability #21

Closed AlexanderLyNL closed 6 years ago

AlexanderLyNL commented 6 years ago

Hi Merlise,

Here's the link to the R file that makes BAS produce NaNs.

https://www.dropbox.com/s/girhufa20d3vsi1/basPosteriorInclusionProbZero.R?dl=0

Note that this analysis uses the JASP debug data set that was developed to make things crash:

https://www.dropbox.com/s/6cv1mi9zi3xylgi/debug.csv?dl=0

Upon further inspection I think that the NaNs are caused by (at least) one log marginal being undefined, which is due to the MLE not being well defined (because of an overflow problem??). I've also included the analysis using the BayesFactor package. Please let me know if I need to clarify something. Thanks for looking into this!

Cheers, Alexander

merliseclyde commented 6 years ago

Good news/bad news.

Good: Using dev BAS 1.5.2 with the hierarchical constraints, the model that leads to the NA in the log marginal likelihood is omitted so the model probabilities are all fine!

Bad: lm also has NA's for two of the coefficients in the problem model, so as you say the MLEs are not defined. However the R2 returned by BAS is NaN (probably due to the NA's in coefficients, while for lm it is 0.02281. As the R2 is finite, the log marginal should be well defined even though individual coefficients are not estimable and thus the BayesFactor should be finite.

The matrix X^T W X is poorly conditioned (i.e. negative eigen values!). The Cholesky algorithm for obtaining the MLE's used in BAS in this case fails, but I have been working on adopting a Cholesky with pivoting for non-full rank models. (Yet another branch :-) That was working, but I still need to write a C/FORTRAN function to obtain the SE's from X^X^{-} as back solving with pivoting for the inverse is not part of BLAS/LAPACK. (tested in R, so need to convert that last bit to the lower level languages)

In any case, this will be a great test case for it!

Will also look at that with GLM's - the iteratively re-weighted least squares used to obtain the MLE;s there used QR with pivoting so should be more stable... May want to add the normal family to bas.glm to use the alternative.

This does add an issue - while predictions under BMA are well defined with non-full rank models, BMA of coefficients is going to be problematic as coefficients are not unique.

cheers, Merlise

merliseclyde commented 6 years ago

test model from debug.csv now runs with pivot=TRUE and provides posterior probabilities without any NA's. Models are still part of the output so may need more thought about how to treat rank deficient models with BMA