mne-tools / mne-bids-pipeline

Automatically process entire electrophysiological datasets using MNE-Python.
https://mne.tools/mne-bids-pipeline/
BSD 3-Clause "New" or "Revised" License
134 stars 65 forks source link

BUG: Fix bug with multichannel classification #853

Closed larsoner closed 5 months ago

larsoner commented 5 months ago

Before merging …

When MEG+EEG was used, only EEG was (effectively) being used for classification. This also unifies how rank reduction is done on the data across classifiers as much as possible.

Closes #849

larsoner commented 5 months ago

This is failing on N2pc but really it's because the epochs after ICA all get just DC values. See current output on main:

https://mne.tools/mne-bids-pipeline/stable/examples/ERP_CORE/sub-017_ses-N2pc_task-N2pc_proc-ica+components_report.html#ICA_cleaning

image

Here are the components:

image

@hoechenberger have you ever seen this?

hoechenberger commented 5 months ago

Wow. No, never seen this before. Crazy

agramfort commented 5 months ago

how many ICA components are you keeping here?

larsoner commented 5 months ago

All of them:

│21:40:17│ ⏳️ sub-017 ses-N2pc Output: sub-017_ses-N2pc_task-N2pc_proc-ica_epo.fif
│21:40:17│ ⏳️ sub-017 ses-N2pc Rejecting ICs: 
│21:40:17│ ⏳️ sub-017 ses-N2pc Saving reconstructed epochs after ICA.

After "cleaning" -- i.e., applying ICA with nothing excluded -- the signals are just at DC. If you look at the component plots above they look very weird to me, all dominated by some posterior lateral electrodes on the left and right

hoechenberger commented 5 months ago

how many ICA components are you keeping here?

28

See the report here, it's weird:

https://mne.tools/mne-bids-pipeline/stable/examples/ERP_CORE/sub-017_ses-N2pc_task-N2pc_proc-ica+components_report.html#Epochs_used_for_ICA_fitting

There is one source floating all over the place

larsoner commented 5 months ago

... and I don't understand why ica.apply with no components excluded and ica.pca_components_.shape == (28, 28) could do this:

>>> import numpy as np
>>> import mne
>>> ica = mne.preprocessing.read_ica("sub-017_ses-N2pc_task-N2pc_ica.fif")
>>> epochs = mne.read_epochs("sub-017_ses-N2pc_task-N2pc_epo.fif")
>>> np.linalg.norm(epochs.get_data("eeg"))
0.02460438878364625
>>> ica.exclude
[]
>>> ica.apply(epochs)
>>> np.linalg.norm(epochs.get_data("eeg"))
0.0011756138412942015

There is this oddity:

>>> np.linalg.norm(ica.mixing_matrix_, axis=1)
array([8.66259607e-47, 1.89649834e-46, 2.14286278e-42, 3.77719401e-43,
       1.51342878e-42, 1.56778100e-42, 9.51390862e-45, 8.49030155e-43,
       3.67336109e-44, 8.76574657e-43, 1.29429011e-42, 1.00530436e-42,
       6.62837726e-43, 1.54159591e-42, 2.31025037e-42, 4.13197155e-43,
       3.33925565e-42, 2.43329874e-43, 3.89629373e-42, 5.88178562e-43,
       1.66169765e-42, 1.77040984e-43, 6.28740326e-42, 8.25874636e-42,
       7.77686406e-43, 2.86216171e-43, 2.40104592e-45, 2.40409726e-16])

So there is one row of the mixing matrix with is almost 30 orders of magnitude greater than the others.

larsoner commented 5 months ago

... and to reproduce if you run pytest mne_bids_pipeline -k N2pc then look in ~/mne_data/derivatives/mne-bids-pipeline/ERP_CORE/sub-017/ses-N2pc/eeg you can examine the outputs. You don't need to be on this branch because the problem exists in main.

hoechenberger commented 5 months ago

So there is one row of the mixing matrix with is almost 30 orders of magnitude greater than the others.

That's probably the one causing this here:

image

I suppose there's a broken sensor

larsoner commented 5 months ago

Okay I think it might be a rank problem:

(Pdb) p ica.pca_explained_variance_
array([1.40405084e+01, 4.95925493e+00, 1.85425023e+00, 1.10919786e+00,
       1.05766829e+00, 8.38319830e-01, 6.92850939e-01, 5.33350818e-01,
       4.21512314e-01, 3.78933522e-01, 3.59757413e-01, 2.57562603e-01,
       2.22234482e-01, 1.79211170e-01, 1.23914900e-01, 1.14414326e-01,
       9.59149296e-02, 8.30701505e-02, 6.88505805e-02, 5.84232652e-02,
       5.45151329e-02, 4.82089102e-02, 3.84316888e-02, 3.71497040e-02,
       3.06821001e-02, 2.93183875e-02, 1.74473508e-02, 5.77968362e-32])

That last component explaining 1e-32 part of the variance is probably just noise. So the data are rank deficient but somehow we're not capturing that. I'll look into the ica_n_components to see if there's an easy fix

larsoner commented 5 months ago

Okay I think this fixes things so I'll mark for merge-when-green