jaredhuling / oem

Penalized least squares estimation using the Orthogonalizing EM (OEM) algorithm
http://jaredhuling.org/oem
25 stars 6 forks source link

Support AIC and BIC calculation #14

Closed tomwenseleers closed 6 years ago

tomwenseleers commented 6 years ago

Would it be a lot of trouble to define methods to calculate AIC and BIC values for a fitted oem model, similar to the way the ncvreg package does this? (i.e. returning a vector of AIC or BIC values for each lambda value that was used) This would help selecting the best lambda based on the AIC or BIC criterion, which is faster than using cross validation.

jaredhuling commented 6 years ago

It is already available. When you run the oem function, you need to specify the argument compute.loss = TRUE. Then you can run the logLik() function on the fitted model object and it will return the log likelihood, from which you can compute AIC/BIC

Note however that computing the loss function increases computation time if you are computing with more lambda values

tomwenseleers commented 6 years ago

Well I'm not sure it's that trivial, because one also needs the effective degrees of freedom to calculate AIC or BIC. For the LASSO, effective degrees of freedom are given by the nr of nonzero coefficients, but for elastic net the formula is more complicated, see https://stats.stackexchange.com/questions/327088/how-can-i-calculate-the-number-of-degrees-of-freedom-in-the-elastic-net-regulari (not sure if this might have been implemented already somewhere) And for ridge regression effective degrees of freedom can be calculated using the rms package. All this would ideally require a slot $df with the effective degrees of freedom to be returned. Plus it would still be handy to have pre-defined AIC() and BIC() methods...

jaredhuling commented 6 years ago

That's a fair point; I hadn't considered the l2 penalty. It's not really hard to calculate, per se, but it will add to computation time

jaredhuling commented 6 years ago

By the way, what model are you using? Linear or logistic? What are the dimensions of your problem?

tomwenseleers commented 6 years ago

For the moment linear/gaussian, and I have n=ca. 500 (nr of cases) and p typically between 10 and 200, but it varies a bit (there could also be cases where p>n, e.g. with p=2000, so then I would have to use glmner or ncvreg, or where n>>p, e.g. in case I would combine data from multiple samples in a column-augmented matrix, e.g. with N=10 samples my n would become 5000 and p would stay the same, typically around 200). I have to fit this model 20 000 times though, as I am dealing with a longitudinal multichannel signal with 20 000 time points (a gas chromatography / mass spectrometry chromatogram with 20 000 time slices, so the dependent variable is a multichannel time slice from the chromatogram, and my covariate matrix is a matrix of target mass spectra - for performance reasons I am fitting all of these separately, but I do selection of the best lambda based on the coefficients for the whole chromatogram; I also use the penalty.factor argument to do differential penalization based on how far the expected elution time of each target mass spectrum in the covariate matrix is from the time slice under consideration). (Ideally I would like to be able to use a Poisson noise model but with an identity link and nonnegativity constraints, but I haven't seen this implemented anywhere)