Simplify CSP algorithm implementation

cbrnr commented 9 years ago

As discussed at https://github.com/scot-dev/scot/issues/64, MNE's implementation of CSP consists of several steps (basically two times PCA using linalg.eigh). In contrast, SCoT directly computes CSP by solving a generalized eigenvalue problem. Mathematically, these procedures should be identical, but I'm not sure about numerical stability.

Now I tested MNE's CSP with my EEG data, and I found the following issues:

It doesn't work out of the box, because linalg.eigh raises an exception, because "the leading minor of order 64 of 'b' is not positive definite" (line 175 in _fit).
If I replace linalg.eigh with linalg.eig, the transformed epochs contain -inf.

If I replace the multi-step implementation of MNE with the single-step implementation of SCoT, I do not get infinities (however, I also have the first problem so I can't use linalg.eigh either).

So I was wondering where this issue comes from. It is strange that one of the matrices is not PD in the first place since it is a covariance matrix (I reported this at https://github.com/scipy/scipy/issues/5182#issuecomment-134598284 before I found out that the matrix is actually not PD).

Any thoughts? I could put the data along with a minimal example on GH if you want.

agramfort commented 9 years ago

did you regularize the cov?

cbrnr commented 9 years ago

No. You're right, it works if I set cov="lws". I wonder what this means with respect to the two implementations though (I forgot the other difference; MNE uses all data for estimating the cov, whereas SCoT averages trial-based covs).

agramfort commented 9 years ago

I forgot the other difference; MNE uses all data for estimating the cov, whereas SCoT averages trial-based covs

that should be the same up to maybe a scaling factor which does not matter for prediction.

cbrnr commented 9 years ago

True. Except that you could do other stuff with the trial-based approach as @alexandrebarachant mentioned. So we could make this an option if people find it useful.

Lastly, what about the single-step vs. multi-step approach?

agramfort commented 9 years ago

whatever runs faster with less memory allocation

alexandrebarachant commented 9 years ago

So I was wondering where this issue comes from. It is strange that one of the matrices is not PD in the first place since it is a covariance matrix.

Covariance matrices are not always PD. In EEG, they usually are because of the noise, but if you don't have enough time sample or if you applied a method that remove a rank (f.eg. average reference) you ends up having this problem. Some eigenvalues decomposition are more robust than other. The Matlab function eig is confusing because it internally does the switch to complex eig in case your matrices are not PD or if matrices are not symmetric. I prefer the python behavior, it's a warning that tells you to start using regularization.

cbrnr commented 9 years ago

Thanks @alexandrebarachant, I agree that the NumPy way is more explicit.

Now regarding the CSP implementation, I compared the MNE vs. SCoT variants. Although mathematically equivalent (up to a scaling factor), the trial-based vs. concatenated covariance estimation approach does lead to minor differences in some weights. I guess this is because in the trial-based approach, we're estimating a covariance matrix with little data as opposed to using all concatenated trials at once. It doesn't make a huge difference though, and when plotting the scalp projections of the patterns, there are no visible differences.

Regarding the multi-step vs. single-step approach, I noticed that we're scaling differently (this of course does not matter). Computationally, I guess the single-step approach should be more efficient because we're doing:

e, w = eigh(sigma1, sigma1 + sigma2)
order = np.argsort(e)[::-1]
w = w[:, order]

Whereas MNE is doing:

lambda_, u = linalg.eigh(cov_a + cov_b)
ind = np.argsort(lambda_)[::-1]
lambda2_ = lambda_[ind]
u = u[:, ind]
p = np.dot(np.sqrt(linalg.pinv(np.diag(lambda2_))), u.T)
w_a = np.dot(np.dot(p, cov_a), p.T)
w_b = np.dot(np.dot(p, cov_b), p.T)
vals, vecs = linalg.eigh(w_a, w_b)
ind = np.argsort(np.maximum(vals, 1. / vals))[::-1]
vecs = vecs[:, ind]
w = np.dot(vecs.T, p)

The speedup of the SCoT implementation is certainly negligible (25ms instead of 26ms). However, IMO the code is easier to read, and this is also how most CSP functions are implemented (at least all that I know of). The Biosig implementation has both variants if I read the code correctly.

cbrnr commented 8 years ago

Any thoughts on replacing the existing code with a simpler variant? The results are identical up to a scaling factor (each patterns of the new version is scaled differently with respect to the existing code). I think the shorter version is (1) much easier to understand and (2) the default in almost all implementations and publications.

jasmainak commented 8 years ago

Did you try running the example? how do the patterns look according in the two algorithms? Exactly the same or there are some minor differences?

agramfort commented 8 years ago

(2) the default in almost all implementations and publications.

which other implementations?

cbrnr commented 8 years ago

https://github.com/sccn/BCILAB/blob/master/code/paradigms/ParadigmCSP.m http://sourceforge.net/p/biosig/code/ci/master/tree/biosig4matlab/t300_FeatureExtraction/csp.m https://github.com/bbci/bbci_public/blob/master/processing/proc_cspAuto.m

These are at least implementations popular in the BCI community.

agramfort commented 8 years ago

ok thanks

I have no strong opinion here...

@trachelr an opinion?

trachelr commented 8 years ago

The reference code cited by @cle1109 are different methods using CSP (e.g. filter bank CSP, Spectrally Weighted CSP, Common Spatio-Spectral Patterns, etc...) and could use either a single or multi step approach for solving the decomposition. Therefore, in my opinion, it would be more important to add CSP-based methods in MNE, because they are popular in the BCI community, rather than debating about the single vs multistep implementation of the CSP itself...

cbrnr commented 8 years ago

@trachelr, I agree that including more CSP variants (CSSP, CSSSP, Spec-CSP) in MNE would be great! However, I think we should agree on this implementation detail first, because all variants are based on this algorithm. Furthermore, the three code examples above are not different methods using CSP - they implement the default vanilla CSP and use the generalized eigenvalue decomposition instead of the multi-step approach currently implemented in MNE.

agramfort commented 8 years ago

The reference code cited by @cle1109 https://github.com/cle1109 are different methods using CSP (e.g. filter bank CSP, Spectrally Weighted CSP, Common Spatio-Spectral Patterns, etc...) and could use either a single or multi step approach for solving the decomposition. Therefore, in my opinion, it would be more important to add CSP-based methods in MNE, because they are popular in the BCI community, rather than debating about the single vs multistep implementation of the CSP itself...

same feeling here...

agramfort commented 8 years ago

@trachelr https://github.com/trachelr, I agree that including more CSP variants (CSSP, CSSSP, Spec-CSP) in MNE would be great! However, I think we should agree on this implementation detail first, because all variants are based on this algorithm. Furthermore, the three code examples above are not different methods using CSP - they implement the default vanilla CSP and use the generalized eigenvalue decomposition instead of the multi-step approach currently implemented in MNE.

again I am no expert here... you're more an expert than me here.

if everyone agrees that we should merge and move on to next CSP variants let do it and please take the lead.

trachelr commented 8 years ago

+1 for merging and moving on to next CSP variants.

cbrnr commented 8 years ago

Alright - the next PR will be another minor thing though (covariance estimation trial-wise vs. concatenated data).

To be absolutely safe, let's wait what @BenjaminBlankertz has to say over at #2630. Also, we should make sure that nothing depends on the particular scale of the patterns - e.g. what about the topomaps? I didn't expect to see differences between the current and the new maps if they autoscaled to the data...

Can we close this issue here and continue the discussion at #2630?

agramfort commented 8 years ago

I see some sign differences in the patterns? if you unify the signs is it more similar?

cbrnr commented 8 years ago

Alright, I'm closing this since #2630 has been merged. Thanks everyone for the great discussion!

mne-tools / mne-python

Simplify CSP algorithm implementation #2437