Closed kingjr closed 7 years ago
partial_fit for linear models -> SGD http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html
online cov estimation is easy though. Now if partial_fit in xdawn the sklearn semantic implies to do the heavy linear algebra (eig decomp etc) after each partial_fit. Not sure it's what you want
online cov estimation is easy though. Now if partial_fit in xdawn the sklearn semantic implies to do the heavy linear algebra (eig decomp etc) after each partial_fit. Not sure it's what you want
For Xdawn I'm unclear, but for CSP you typically don't want to decimate, and it would be common to do
for block in fnames:
raw = Raw(fname)
epochs = Epochs(raw, find_events(raw), preload=True)
X, y = epochs.get_data(), epochs.events[:, 2]
csp_pipeline.partial_fit(X, y)
to intelligently reduce the dimensionality of the data, no? But I agree, that in principle, we just need to aggregate the covariance matrices, without necessarily computing the eigen decomposition on each partial fit.
I am open to partial_fit for CSP and XDAWN. I think the eig decomp is negible compared to IO and memory management.
@alexandrebarachant Do you think it would be ok to do this? This means that we would have to store each covariance and the sample_weight, but I think that it should remain relatively light in most usecases
@kingjr we are talking about 'adaptive' / 'iterative' estimation of the spatial filters ? As alex said, the eig decomp is generally negligible, so to implement this the easiest way is to update the averages:
Since everything is based on arithmetic mean, that will be easy.
However, the main problem will be that the eig decomp can lead to strongly different result and orders of the filters. If you pipeline this with a classification method and run a partial fit, the feature space is very likely to change and the classifier trained on the previous partial fit will be obsolete.
There is paper treating the case of adaptive CSP by directly updating the filters, for exemple this one Incremental CSP. The equation to update the filter is quite simple, you might want to use that.
However, the main problem will be that the eig decomp can lead to strongly different result and orders of the filters. If you pipeline this with a classification method and run a partial fit, the feature space is very likely to change and the classifier trained on the previous partial fit will be obsolete.
Wouldn't ordering the eigen values by explained variance be sufficient? It's not really adaptive in the sense of going from say 100 time samples to 200, or 400 (as with a rolling time window for real time processing), but rather taking very large chunks of data (1e5-7 time samples) so as to not be limited by our RAM.
the problem is when 2 (or more) eigenvalues are close to each other, or if you have an artifact, some very different patterns can flip orders. I spend a long time on this problem during the first year of my PhD. either you use an iterative estimation of the filters, or you build a re-ordering procedure (based on filters similarity between iteration).
That's one of the reason i switch to Riemannian Geometry for my early experiments, since there is no spatial filtering, it is very convenient to do iterative / adaptive learning. I actually have to think about adding the partial_fit
to my toolbox.
But to come back to your point, i'm not sure it is a good idea to ignore this problem because it is not likely to happen when the user know what he is doing.
However, it wont be a problem if you use the partial fit outside a pipeline (or if you don't call any transform between two partial_fit)
Great thanks for the feedback; I'll keep the issue opened until I implement it for the search light.
I think it would be great if we could make use of the
partial_fit
sklearn methods for linear models especially when we try to fit exponentially large data arrays (e.g. time frequency transforms multiple runs, thousands of epochs).SearchLight
, we can just add thepartial_fit
method, and check that the estimator has one.CSP
andXdawn
I'm less clear. Sklearn doesn't seem to havepartial_fit
for covariance estimation, but IIUC, it comes down to storing the means and std underlying the covariance matrices, the number of number of samples (nave) and integrate them over multiple partial fits, before re-estimating the eigen values, right?UnsupervisedSpatialFilter
, I think we could also add apartial_fit
so that we can make use ofIncrementalPCA
.LMKWYT