ENH: partial_fit for linear models

mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python

https://mne.tools

BSD 3-Clause "New" or "Revised" License

2.67k stars 1.31k forks source link

ENH: partial_fit for linear models #3483

Closed kingjr closed 7 years ago

kingjr commented 8 years ago

I think it would be great if we could make use of the partial_fit sklearn methods for linear models especially when we try to fit exponentially large data arrays (e.g. time frequency transforms multiple runs, thousands of epochs).

For SearchLight, we can just add the partial_fit method, and check that the estimator has one.
For the CSP and Xdawn I'm less clear. Sklearn doesn't seem to have partial_fit for covariance estimation, but IIUC, it comes down to storing the means and std underlying the covariance matrices, the number of number of samples (nave) and integrate them over multiple partial fits, before re-estimating the eigen values, right?
For UnsupervisedSpatialFilter, I think we could also add a partial_fit so that we can make use of IncrementalPCA.

LMKWYT

agramfort commented 8 years ago

partial_fit for linear models -> SGD http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html

online cov estimation is easy though. Now if partial_fit in xdawn the sklearn semantic implies to do the heavy linear algebra (eig decomp etc) after each partial_fit. Not sure it's what you want

kingjr commented 8 years ago

online cov estimation is easy though. Now if partial_fit in xdawn the sklearn semantic implies to do the heavy linear algebra (eig decomp etc) after each partial_fit. Not sure it's what you want

For Xdawn I'm unclear, but for CSP you typically don't want to decimate, and it would be common to do

for block in fnames:
     raw = Raw(fname)
     epochs = Epochs(raw, find_events(raw), preload=True)
     X, y = epochs.get_data(), epochs.events[:, 2]
     csp_pipeline.partial_fit(X, y)

to intelligently reduce the dimensionality of the data, no? But I agree, that in principle, we just need to aggregate the covariance matrices, without necessarily computing the eigen decomposition on each partial fit.

agramfort commented 8 years ago

I am open to partial_fit for CSP and XDAWN. I think the eig decomp is negible compared to IO and memory management.

kingjr commented 8 years ago

@alexandrebarachant Do you think it would be ok to do this? This means that we would have to store each covariance and the sample_weight, but I think that it should remain relatively light in most usecases

alexandrebarachant commented 8 years ago

@kingjr we are talking about 'adaptive' / 'iterative' estimation of the spatial filters ? As alex said, the eig decomp is generally negligible, so to implement this the easiest way is to update the averages:

For Xdawn :
- Update the ERP
- Update the signal Cov
For CSP :
- Update the mean covariance matrices

Since everything is based on arithmetic mean, that will be easy.

However, the main problem will be that the eig decomp can lead to strongly different result and orders of the filters. If you pipeline this with a classification method and run a partial fit, the feature space is very likely to change and the classifier trained on the previous partial fit will be obsolete.

There is paper treating the case of adaptive CSP by directly updating the filters, for exemple this one Incremental CSP. The equation to update the filter is quite simple, you might want to use that.

kingjr commented 8 years ago

However, the main problem will be that the eig decomp can lead to strongly different result and orders of the filters. If you pipeline this with a classification method and run a partial fit, the feature space is very likely to change and the classifier trained on the previous partial fit will be obsolete.

Wouldn't ordering the eigen values by explained variance be sufficient? It's not really adaptive in the sense of going from say 100 time samples to 200, or 400 (as with a rolling time window for real time processing), but rather taking very large chunks of data (1e5-7 time samples) so as to not be limited by our RAM.

alexandrebarachant commented 8 years ago

the problem is when 2 (or more) eigenvalues are close to each other, or if you have an artifact, some very different patterns can flip orders. I spend a long time on this problem during the first year of my PhD. either you use an iterative estimation of the filters, or you build a re-ordering procedure (based on filters similarity between iteration).

That's one of the reason i switch to Riemannian Geometry for my early experiments, since there is no spatial filtering, it is very convenient to do iterative / adaptive learning. I actually have to think about adding the partial_fit to my toolbox.

But to come back to your point, i'm not sure it is a good idea to ignore this problem because it is not likely to happen when the user know what he is doing.

However, it wont be a problem if you use the partial fit outside a pipeline (or if you don't call any transform between two partial_fit)

kingjr commented 8 years ago

Great thanks for the feedback; I'll keep the issue opened until I implement it for the search light.