Working with rank-deficient EEGLAB data

hoechenberger commented 4 years ago

So a colleague of mine was handed a set of files that had been pre-processed in EEGLAB. The data was read via mne.read_epochs_eeglab(). When computing the noise covariance via mne.compute_covariance() and visualizing the results via Covariance.plot(), we noticed that the data looked a bit odd. Turns out that during preprocessing, artifact rejection via ICA had been carried out, and some ICs had been removed. So the data was now rank-deficient. However, MNE didn't "pick this up" – in fact, mne.compute_rank() assumed the rank was full even with rank=None to force an actual estimation. And according to raw.info, obviously, the rank was full as well.

So I'm wondering how to best deal with this now. We could certainly pass rank=dict(eeg=true_rank) to all functions that require a rank input. But this seems cumbersome and error-prone, and I'm not sure if there are other places where this situation could lead to errors? I would prefer just being able to do something like, raw.rank = true_rank; or, even better, the EEGLAB import should do something of that sort. A way to detect rank deficiency after ICA is simply by checking if the number of sensors is equal to the number of ICs in the EEG structure:

data = scipy.io.loadmat('myfile.set')
n_comps, n_sensors = data['EEG']['icaweights'][0, 0].shape
if n_comps == n_sensors:
    print('Data has full rank')
else:
    print(f'Data has rank {n_comps}')

(Of course this only works if no (avg) reference has been set)

Any opinions on that?

cc @cbrnr

hoechenberger commented 4 years ago

Also just looking at what MNE's ICA.apply() does, it also appears to me that a potential change of data rank is not "documented" anywhere. Is this not bound to lead to issues?

agramfort commented 4 years ago

one intuition is that the EEGLAB data were stored to disk in single precision and our rank detection code does not use thresholds for this

hoechenberger commented 4 years ago

one intuition is that the EEGLAB data were stored to disk in single precision and our rank detection code does not use thresholds for this

So you think that, under normal circumstances, rank calculation should work? Because I also remember that in the mne-study-template, we had to switch to using rank='info', because the default didn't work with some MEG datasets…

cbrnr commented 4 years ago

Can you really determine the rank of the channel data based on the number of ICs? Especially in the case where some ICs are zeroed out and the remaining ICs are back-projected to channel space, I'm not sure if determining the rank is that easy. How does MNE compute the rank of the data matrix? SVD?

agramfort commented 4 years ago

yes MNE uses SVD and a clipping see https://github.com/mne-tools/mne-python/blob/master/mne/rank.py#L21

hoechenberger commented 4 years ago

@cbrnr

Can you really determine the rank of the channel data based on the number of ICs? Especially in the case where some ICs are zeroed out and the remaining ICs are back-projected to channel space, I'm not sure if determining the rank is that easy.

I'll be totally honest with you and say that I have no idea. All I can say is that when we used this approach, we got much more sensible-looking results with our data:

EEGLAB data noise cov estimated with `rank=None`

noise_cov

EEGLAB data noise cov estimated with `rank=n_comps_from_EEGLAB_icaweights`

image (1)

cbrnr commented 4 years ago

I agree that in this special case it might make sense to infer the rank from the retained components (although strictly speaking, this is probably not the rank from a mathematical perspective). Can you quickly explain the plots to me? What is the noise covariance and what do these plots show?

dengemann commented 4 years ago

Im my experience the # of ica components does not give a strong clue on the rank, it can coincide with it but often it does not.

agramfort commented 4 years ago

yes it's more related to the number of PCA dimensions that were used

hoechenberger commented 4 years ago

@cbrnr

I agree that in this special case it might make sense to infer the rank from the retained components (although strictly speaking, this is probably not the rank from a mathematical perspective). Can you quickly explain the plots to me? What is the noise covariance and what do these plots show?

Based on epoched data, the covariance of the pre-stimulus periods was estimated via mne.compute_covariance, with the rank parameter set to either None or the number of retained ICs. The figures I posted are the SVD spectra plots based on these covariance estimates (generated via Covariance.plot()).

My layman's way of reading this plot is that the ordinate shows how much variance is explained by the data, and the abscissa corresponds to the sensors, ordered descending by explained variance. So as you move along the abscissa from left to right, once you reach the number of channels that is equal to the data rank, you will see a steep decline in explained variance.

In the first figure, there is an unusual step in this plot at around sensor ~50; investigation of the data showed that only 48 ICs had been retained, so we then set rank=48 in our call to compute_covariance, which produced the second figure. And this one looks much more like what I would expect.

agramfort commented 4 years ago

what was the PCA dimension used in eeglab?

hoechenberger commented 4 years ago

@agramfort @dengemann Bummer. So. How to proceed from here, then? @agramfort suggested this might have to do with floating point precision in the EEGLAB data set… how could we handle that?

hoechenberger commented 4 years ago

what was the PCA dimension used in eeglab?

I don't think PCA dimensionality reduction was used, but I would need to check

agramfort commented 4 years ago

please check and eventually sahre the eeglab file

hoechenberger commented 4 years ago

I had a quick look at the preprocessing pipeline, which was based on PREP, and I couldn't find any hints that PCA was used.

I've shared the data with @agramfort, but if anyone else would like to take a look too, let me know.

This is a MWE used to derive my observations. (Plots look a little different because this is a different participant now, but the pattern is the same as what can be seen in the figures I posted above)

import mne
import matplotlib
from scipy.io import loadmat

matplotlib.use('Qt5agg')

fname = 'eeglab_epochs.set'
epochs = mne.read_epochs_eeglab(fname)

# Estimating covariance with rank=None
cov = mne.compute_covariance(epochs=epochs, tmax=0, method='auto', rank=None)
cov.plot(epochs.info)

# Estimating covariance with rank derived from number of retaind ICs
eeglab_data = loadmat(fname)['EEG']
n_components, n_sensors = eeglab_data['icaweights'][0, 0].shape
rank = dict(eeg=n_components)
cov = mne.compute_covariance(epochs=epochs, tmax=0, method='auto', rank=rank)
cov.plot(epochs.info)

cbrnr commented 4 years ago

Maybe because ICA is usually kickstarted with PCA this often (but not always) has the same effect on rank reduction like plain PCA?

hoechenberger commented 4 years ago

Maybe because ICA is usually kickstarted with PCA this often (but not always) has the same effect on rank reduction like plain PCA?

I don't know… but, remember that 'pca' parameter from our attempts to find the MNE parameter set that's equivalent to EEGLAB's ICA? None of that was done during preprocessing of the data – no explicit dimensionality reduction. What the ICA algorithm does internally I don't know, however.

larsoner commented 4 years ago

This is a strange case. Those components are definitely lower amplitude than the rest, but being down by just two orders of magnitude does not indicate that they have been (properly) zeroed out numerically. So in some sense, whitening is doing something reasonable by boosting them. But it's understandable that the whitened data then are dominated by these components when you look at it.

On the other hand, if you really want those components gone and you do it by setting the rank (manually or if you have the code detect it automatically somehow), then implicitly you will project all of those lower-dimensional components into the null space of the data when you eventually whiten it or the forward operator with the rank-deficient noise-cov-based whitener. I think this ends up being mathematically equivalent to using those null-space components as SSP / projection vectors in the first place (which could be okay), except that you will have ignored any additional rotation of the space done by ICA. So if the desired outcome is truly zeroing out those N components, it might actually be better to take the components, orthogonalize them, and use them properly as SSP vectors in this case.

I also might be wrong about the interpretation here, hopefully going through the data with @agramfort at some point will be informative. My experience with ICA is very limited...

agramfort commented 4 years ago

looking at the file I see:

icaweights: [48x63 double]

so a PCA of dim 48 was done before ICA. This explains cliff in eigen values you see

hoechenberger commented 4 years ago

looking at the file I see: icaweights: [48x63 double] so a PCA of dim 48 was done before ICA. This explains cliff in eigen values you see

Is that what it means? I thought it implies that 63-48 ICs have been discarded…

But if you're right, this would make things easy, no? Because then I could be certain that the rank actually is 48 in this case?

hoechenberger commented 4 years ago

I just browsed a little through the PREP pipeline code and apparently it does things like:

        interpolatedChannels = ...
            EEG.etc.noiseDetection.reference.interpolatedChannels.all;
        channelsLeft = setdiff(referenceChannels, interpolatedChannels);
        pcaDim = length(channelsLeft) - 1;
        fprintf('%s: pcaDim = %d\n', EEG.setname, pcaDim);
        EEG = pop_runica(EEG, 'icatype', 'cudaica', 'extended', 1, ...
            'chanind', referenceChannels, 'pca', pcaDim);

So there we have it… maybe some channels got interpolated and then PCA was used to reduce data dimensionality before ICA was run.

Which is great in the sense that we may have an explanation of what happened here and how the data got to be.

But it also gets back to my original question somehow: Is there a way to tell MNE, ideally upon data import, about the rank of that data? Otherwise I would have to manually keep track of all rank-altering operations to pass the correct rank parameter for covariance estimation and inverse operator assembly.

dengemann commented 4 years ago

But it also gets back to my original question somehow: Is there a way to tell MNE, ideally upon data import, about the rank of that data? Otherwise I would have to manually keep track of all rank-altering operations to pass the correct rank parameter for covariance estimation and inverse operator assembly.

I think rank-checking is not an easy one. Relying on automation is never 100% safe. I've seen similar issues also in large-scale datasets such as LEMON from MPI-CBS. In the end manual rank-checking is doable: you compute covs, plot super-imposed all eigenvalue spectra and search for a common rank that make sense as a compromise. It's not too bad. I'd be of course happy if we can do better in terms of automated detection but the manual route is not too annoying in the meantime.

hoechenberger commented 4 years ago

I think rank-checking is not an easy one. Relying on automation is never 100% safe.

I do agree, but in my specific EEGLAB example, it could have worked :) Don't you think we could try to cover such cases, and maybe emit a warning to let the user know that we're doing our best here, but might be wrong? And how would we store that information, anyway?

hoechenberger commented 4 years ago

OR at least print a warning when loading such a dataset?

agramfort commented 4 years ago

what do you mean such data? EEGLAB data which contain an ICA solution?

I would suggest to people to look at their covariance

dengemann commented 4 years ago

Don't you think we could try to cover such cases, and maybe emit a warning to let the user know that we're doing our best here, but might be wrong? And how would we store that information, anyway?

I'd document how to do this by hand inspecting cov plots. I think a cool addition could be a convenience function making superimposed plots of SVD spectra, suggesting a common rank, or at least facilitating finding one visually. Imagine you plot 100 cov SVD spectra with some transparency to then see where the kink is gonna be in most of the cases, then you go for a lower bound. It's really fast and easy in the end.

hoechenberger commented 4 years ago

I think a cool addition could be a convenience function making superimposed plots of SVD spectra, suggesting a common rank, or at least facilitating finding one visually. Imagine you plot 100 cov SVD spectra with some transparency to then see where the kink is gonna be in most of the cases, then you go for a lower bound. It's really fast and easy in the end.

Please ignore my ignorance here, but: which SVD spectra do you suggest to overlay? Those from multiple recordings / participants? Or those from a single recording, with covariance estimates assuming rank = [1, n_sensors] or something?

I'd document how to do this by hand inspecting cov plots.

Yes that would be helpful. I had no idea what these plots were supposed to look like until @agramfort told me some basics.

I think this is a difficult thing for quite a few users coming from EEG. In my previous lab, we'd use Cartool for source estimation, and it (or the LAURA algorithm it uses) doesn't require you to pass any kind of (noise) covariance matrix. In fact, you just plug in your evoked signals and you're set. So that's quite different (and more complicated) in MNE then… And those users then, like me, would have no idea about how to read these SVD plots.

In my experience, many researchers attempt doing their first steps with MNE by plugging in preprocessed data, because a motivation is to take advantage of MNE's source reconstruction capabilities. That said, and given the popularity of EEGLAB, I think we should help users more by at least providing a useful hint when they import potentially rank-deficient data.


eeglab_data = loadmat(fname)['EEG']
if 'icaweights' in eeglab_data:
    n_components, n_sensors = eeglab_data['icaweights'][0, 0].shape
    if n_components != n_sensors:
        logger.info(f'It is likely that the rank of your data was reduced to '
                    '{n_comps} via PCA before running ICA in EEGLAB. Please '
                    'read https://mne.tools/wtf-does-that-mean to find more '
                    'information on what that means, and how do handle it.')

Also I'm still concerned about the fact that a user would have to manually keep track of rank-changing operations from that point onward. Can we not offer something like,

raw = raw.reduce_rank_via_pca(rank=123)

keep track of the new rank somewhere, and be good? We already take into account projectors and reference channels when rank='info', so… can we not build upon that?

dengemann commented 4 years ago

Please ignore my ignorance here, but: which SVD spectra do you suggest to overlay?

Multiple subjects, such that you get a feeling for what is normal and where to suspect the real common rank.

I think this is a difficult thing for quite a few users coming from EEG. In m

It's a concern that goes far beyond source modeling.

The rank may also be important for certain predictive modeling pipelines which need explicit treatment of the rank, e.g, when using Riemannian embeddings.

providing a useful hint when they import potentially rank-deficient data.

Sure, why not as long as we have a clear criterion for detecting that.

Also I'm still concerned about the fact that a user would have to manually keep track of rank-changing operations from that point onward.

Things can get more complicated and cannot be easily predicted once head movements and other non-stationarities come into play, especially in MEG. If you want to really be sure what's going on you must do it by hand and document things using plots.

Can we not offer something like (...) keep track of the new rank somewhere, and be good?

I am not sure I understand where you want to go. Why would you offer a function to reduce the rank (who needs to do this anyways knows how to do this)?

cbrnr commented 4 years ago

I think including a nice message specifically for EEGLAB data would be a great addition. I particularly like the URL - we should put our glossary or FAQ there.

Regarding keeping track of the rank, I think it is always best to just compute the rank yourself. You can use mne.compute_rank and maybe adapt tol to get a good estimate. I haven't used these rank calculations before, but would it be possible to estimate the threshold by detecting a steep drop in the eigenvalues (as opposed to an absolute threshold)? I.e. we look at the derivative and threshold that to get the rank?

dengemann commented 4 years ago

I haven't used these rank calculations before, but would it be possible to estimate the threshold by detecting a steep drop in the eigenvalues (as opposed to an absolute threshold)? I.e. we look at the derivative and threshold that to get the rank?

Yes. But sometimes it's not so easy and corner cases are more frequent than it seems. There can be noise, sometimes double kinks appear etc. or kinks are too smooth, often caused by non-stationarity. Therefore, I'd say a semi-automated approach is good compromise: use compute_rank / the cov plots (which do this internally) and then see if the estimates make sense. Correct if needed.

dengemann commented 4 years ago

https://mne.tools/wtf-does-that-mean

@hoechenberger btw. I could not find that URL ... nice idea though.

cbrnr commented 4 years ago

@dengemann I agree! So this leaves the info message for imported EEGLAB data for this issue.

Is there an MNE tutorial about rank estimation and when rank is important? It sounds like this might be an interesting topic relevant to MNE users.

hoechenberger commented 4 years ago

@dengemann

Please ignore my ignorance here, but: which SVD spectra do you suggest to overlay?

Multiple subjects, such that you get a feeling for what is normal and where to suspect the real common rank.

Ok, could be nice I guess :) But in my specific use case, this wouldn't have helped, as apparently the rank deficiency I observed was introduced by interpolation of bad channels, followed by a PCA step before doing ICA. Since the number of interpolated channels can vary between participants, I don't think a "common" rank would make sense here.

providing a useful hint when they import potentially rank-deficient data.

Sure, why not as long as we have a clear criterion for detecting that.

I think the code snippet I provided above should be a reliable indicator for this specific use case. But I'd be more comfortable if some EEGLAB experts were to confirm this :)

Also I'm still concerned about the fact that a user would have to manually keep track of rank-changing operations from that point onward.

Things can get more complicated and cannot be easily predicted once head movements and other non-stationarities come into play, especially in MEG. If you want to really be sure what's going on you must do it by hand and document things using plots.

Can we not offer something like (...) keep track of the new rank somewhere, and be good?

I am not sure I understand where you want to go. Why would you offer a function to reduce the rank (who needs to do this anyways knows how to do this)?

It's not so much about reducing rank per-se, but to inform MNE that the rank of the data has changed. You see, in the mne-study-template, for example, I'm explicitly passing rank='info' when estimating covariance and when creating the inverse operator. We did this after discovering that rank=None, i.e. rank estimation based on the data alone, didn't yield very usable results with the data we had tested the pipeline on – switching to rank='info' fixed these issues. Now I haven't looked into the code to figure what exactly rank='info' does, but my undersatanding is that it takes into account projectors, reference channels etc. It would be good if a rank-reducing operating like PCA could be considered and treated accordingly within this existing framework.

@cbrnr

I particularly like the URL - we should put our glossary or FAQ there.

@dengemann

@hoechenberger btw. I could not find that URL ... nice idea though.

Haha, not sure the others would be too happy about it… 😂 But if there are no objections, I'd be the first to sign up!

dengemann commented 4 years ago

don't think a "common" rank would make sense here.

In some contexts a common rank would be the minimum rank, deliberately inducing more rank reduction for some of the datasets. Anyways, whether comparing subjects or not, cov plots are really amazingly useful.

Is there an MNE tutorial about rank estimation and when rank is important? It sounds like this might be an interesting topic relevant to MNE users.

@cbrnr I think some info can be found here and there. A tutorial on debugging rank issues could be really nice and save some user support time in the future ...

hoechenberger commented 4 years ago

Just for the record, I just pulled @larsoner's #7736, and this is what I get with the problematic EEGLAB data:

>>> print(mne.compute_rank(epochs, tol=1e-6, tol_kind='relative'))

Computing rank from data with rank=None
    Estimated rank (eeg): 48
    EEG: rank 48 computed from 63 data channels with 0 projectors
{'eeg': 48}

… which is good!

So… merely a precision issue, like @agramfort suggested??

larsoner commented 4 years ago

I didn't understand how that PR could fix it until I looked back at...

EEGLAB data noise cov estimated with rank=None

... I realized that I probably misunderstood this plot. Did you have some form of regularization on (e.g., shrunk) during estimation? If so it would explain why the first shelf is only 1e-2 or so below the real data (regularization inflated the eigenvalues). Can you do the same plot of the noise covariance estimated with empirical? You will probably see a much larger shelf at 48 (at least 1e-6 below the largest eigenvalue), and hence using a rtol vs atol like in #7736 could indeed help.

hoechenberger commented 4 years ago

@larsoner Will get back to you tomorrow!

hoechenberger commented 4 years ago

Okay, so I asked my colleagues to provide me with more data. And as is sometimes the case, things were a little messy – I couldn't exactly recreate the figure I posted initially, and it was unclear which dataset and version of MNE were used to generate it. Also turned out I got mixed up between data rank and noise covariance matrix rank rank at some point.

So I just started from scratch.

MWE

This is the MWE I came up with. Please find the example dataset here: https://drive.google.com/open?id=13aiakuwp1HmJt-BYsRISCk6XcdO2INuf

import mne
from scipy.io import loadmat

mne.set_log_level(verbose='error')

# Load data
infile = '05_ICAr_newevents.set'

eeglab_data = loadmat(infile)['EEG']
eeglab_n_comps, eeglab_n_sensors = eeglab_data['icaweights'][0, 0].shape

epochs = mne.read_epochs_eeglab(infile)
epochs.apply_baseline()

tmax = 0

# Rank computations start here.
rank_epo = mne.compute_rank(epochs.copy().crop(tmax=tmax))['eeg']

try:   # Only works when running mne-python from larsoner:rank
    rank_epo_rtol = mne.compute_rank(epochs.copy().crop(tmax=tmax),
                                     tol=1e-6, tol_kind='relative')['eeg']
except TypeError:
    rank_epo_rtol = '—'

cov_empirical = mne.compute_covariance(epochs, tmax=tmax, method='empirical')
cov_shrunk = mne.compute_covariance(epochs, tmax=tmax, method='shrunk')

rank_cov_empirical = mne.compute_rank(cov_empirical, info=epochs.info)['eeg']
rank_cov_shrunk = mne.compute_rank(cov_shrunk, info=epochs.info)['eeg']

print(f'No. of channels:                     {epochs.info["nchan"]}')
print(f'1st dim of EEGLAB icaweights matrix: {eeglab_n_comps}')
print(f'Pre-stimulus Epochs rank (default):  {rank_epo}')
print(f'Pre-stimulus Epochs rank (rtol):     {rank_epo_rtol}')
print(f'Noise cov. (empirical) matrix rank:  {rank_cov_empirical}')
print(f'Noise cov. (SHRUNK) matrix rank:     {rank_cov_shrunk}')

Output on `master`

No. of channels:                     63
1st dim of EEGLAB icaweights matrix: 48
Pre-stimulus Epochs rank (default):  63
Pre-stimulus Epochs rank (rtol):     —
Noise cov. (empirical) matrix rank:  48
Noise cov. (SHRUNK) matrix rank:     63

Output on `larsoner:rank` (#7736)

No. of channels:                     63
1st dim of EEGLAB icaweights matrix: 48
Pre-stimulus Epochs rank (default):  63
Pre-stimulus Epochs rank (rtol):     48
Noise cov. (empirical) matrix rank:  48
Noise cov. (SHRUNK) matrix rank:     63

SVD Plots

The SVD plots are identical between master and larsoner:rank, as far as I can tell.

Empirical

SHRUNK

Consistency of results

I've tested this on 24 datasets, and the pattern of results is always the same as shown above in all but one dataset: cov. rank (SHRUNK-regularized) is reported as full; while the empirical cov. rank is equal to the 1st dimension of the icaweights matrix and to the rtol-based rank. In one dataset, the computed rank deviates by 1 from the first icaweights matrix dimension (50 vs 51).

Additional information

I was also provided with a list of interpolated channels, and the number of interpolated channels is typically much smaller than the difference between the number of total channels and the first dimension of the icaweights matrix. Since, like I said, the PREP pipeline performs interpolation and rank reduction via PCA before ICA (based on the number of interpolated channels), I am inclined to believe that the icaweights matrix contains the ICs that are to be kept after PCA and (manual) IC rejection.

larsoner commented 4 years ago

Output on larsoner:rank (#7736) ... Noise cov. (SHRUNK) matrix rank: 63

This is expected because the rank internally is calculated still using atol not rtol, so this result should not change. Basically if you want rtol behavior or custom tol you should use rank = compute_rank(...) manually, then pass the rank to compute_covariance.

Automatic rank estimation will always have this sort of problem where you might miss the shelf. If we try to make it smarter we will almost certainly break some other things that currently work. So I think we just need to document that people need to examine the SVD spectrum of their empirical noise cov first, then ensure that the rank is preserved if they are using an automatic regularizer, because if the rank is not computed properly that automatic regularizer will inflate the "zero" eigenvalues.

hoechenberger commented 4 years ago

This is expected because the rank internally is calculated still using atol not rtol, so this result should not change.

But this still doesn't explain why I (correctly) get

Noise cov. (empirical) matrix rank: 48

no? Or is the regularization in compute_covariance performed before calling compute_rank?

Edit Never mind, I guess this question didn't make too much sense ;) I'm still slightly confused, though!

larsoner commented 4 years ago

In the empirical case, it doesn't matter what you pass as rank in compute_covariance(..., rank=...) because it's not used, and the resulting matrix is unregularized. And then when you compute the rank of that matrix using rtol, you get 48.

In the shrunk case, when you do rank=None when computing the covariance, internally we estimate the rank of the covariance matrix using atol, which will mis-estimate the rank to be 63. The covariance is regularized in this dim-63 space, so then the eigenvalues that probably should get zero get inflated by the regularization. At that point it doesn't matter if you compute the rank of the resulting regularized matrix with atol or rtol because the regularization process has (errantly) moved those "should-be-zero" eigenvalues up to just ~1e-2 below "should-be-nonzero" eigenvalue levels -- you'll get 63 either way.

On the other hand, if you used the rtol method before regularizing the matrix by calling rank = compute_rank(..., rank_tol_kind='relative'), you will get back correctlry that rank = dict(eeg=48), and then when you pass this to compute_covariance(..., rank=rank, method='shrunk') it will correctly use the subspace of dimension 48 when regularizing, and then if you compute the rank of that matrix using rtol or atol (shouldn't matter at that point) they should both be 48.

hoechenberger commented 4 years ago

Thanks for the explanation, @larsoner!

So I guess what this boils down to, in the end, is improving the docs?

larsoner commented 4 years ago

Yeah I think so

mne-tools / mne-python