find away to compare models efficiently

unfoldtoolbox / unfold

A matlab EEG toolbox to perform overlap correction and non-linear & linear regression.

GNU General Public License v3.0

56 stars 14 forks source link

find away to compare models efficiently #11

Open behinger opened 6 years ago

behinger commented 6 years ago

Modelcomparisons are often necessary. How to do them in a computational efficient way with least amount of assumptions.

Proposals:

R^2
Square Error
likelihood ratios
some kind of predictive quantity => Crossvall
bootstrapping of any measurement above
AIC / BIC

I think we should/could offer AIC & BIC values. R^2 is too missleading (and will be extremely low, we are trying to predict continuous EEG!). But, tbh I need to look at some datasets how the residuals are shaped (they need to be normally distributed so I can assume a normal likelihood for AIC/BIC).

behinger commented 6 years ago

based on a simple model on one subject, it is highly non-normal...

behinger commented 6 years ago

I tried some t-distributions to fit the heavy tails, but they do not appear to give a better estimate. Not sure how to proceed.

Is alpha triggered by blinks? Maybe adding blink events could help?! I guess noone knows how alpha is triggered

jpossandon commented 6 years ago

are you sure the tails do not correspond to artefacts and low-frequnecy activity that it should not be in the data anyways? I said this because of the magnitude of the residuals ...

behinger commented 6 years ago

I did not check, but should, the timeseries of the residuals. I guess it will be mostly alpha. The heavy tails are visible in 20% of the data, (>0.9 and <0.1 quantiles)

behinger commented 6 years ago

ok I checked, it is mostly alpha-bursts that give raise to high values (in this dataset). Not sure how to cope with them, I guess just filtering before calculating the likelihood is a bit harsh ;)

jpossandon commented 6 years ago

Well, I am at the moment trying to deconv alpha activity (alpha envelope from the hilbert transform of bandpass data), so i'll let you know how it goes. Is there an easy way to get the residuals from dc_glmfit?

behinger commented 6 years ago

I'm just noticing github did not post my last post (sorry!) This works for 1-channel data:

yhat = EEG.deconv.dcBeta(:)' * EEG.deconv.dcX';

resid = EEG.data - yhat;

For multichannel you need a forloop over channels, or use mtimesxfrom filecentral or use GPUarrays

btw: df claculations: http://www.stat.cmu.edu/~ryantibs/advmethods/notes/df.pdf

Afaik AIC can be easily calculated k = num_param n = num_data

AIC = 2k + n Log(sum(resid.^2)/n), But again, assumes normality of residuals (not given), and constant variance.

One could calculate AICs for all subjects, use the central limit theorem and assume normality over all AICs over subjects and then test the AIC distributions of two or more models against each other (paired). But I highly doubt this is an effective/powerful statistic if the assumptions fail as much as they do here.