mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.68k stars 1.31k forks source link

maxmove, reject raw, and ICA #2353

Closed matt-erhart closed 9 years ago

matt-erhart commented 9 years ago

I'd like to run ICA for auto-ECG removal and EOG component rejection (with correlation) on multiple maxmoved fifs per subject. I load all the fifs into a single raw instance currently. I will often have artifacts at the start/end of fif files since the subject might be talking to us etc while we are recording but not running an experiment. There may be discontinuities between fif files. Rarely, maxmove will zero out data if something crazy happens. Maxmove will also significantly reduce the rank of data, to 64 iirc. This is a vectorview with 306 channels, a vertical EOG, and cHPI.

Does ECG detection need to happen on MAGs or will grads do?

Assuming I must use raw for ECG rejection, what's the best way to prepare/clean this data so that ICA will work as well as possible?

Is there a way/function to reject portions of raw data without breaking epoching (.e.g cutting out half a trail).

What are the ideal ICA settings for maxmoved data, e.g. for n_PCA_comps?

What exactly is happening with the proj removal bellow and why is this step in the ICA doc but not in any of the ICA examples?

projs, raw.info['projs'] = raw.info['projs'], [] ica.fit(raw) raw.info['projs'] = projs

I've found ICA can work better on smaller chunks of data. Is there a good way to run ICA on portions of the raw object iteratively?

What's a good way to reject epochs after ICA, perhaps visually or with a custom goodness-metric?

Anything else to be aware of while processing this kind of data?

jona-sassenhagen commented 9 years ago

I've found ICA can work better on smaller chunks of data

What do you mean? ICA requires a lot of data, and for 306 channels, your requirements should be a lot of samples (unless you only fit few components).

ica.fit's rejection procedure rejects data internally, without affecting the source instance.

How do you filter pre-ICA?

matt-erhart commented 9 years ago

Smaller chunks refers to fewer time samples. 'Can work better' refers to a greater reduction in EOG contamination in the grad most affected by blinks. 30-45 minutes of MEG can sometimes have non-stationary sources, due to movement mostly. Even non-movement related strangeness can throw off ICA. Separate ICA runs with smaller amounts of data can have an easier time pulling out components, as long as each run has enough data.

Filtering varies by the analysis. This current run I'm looking pretty high up, so it's something like 1 to 200hz. Usually, it's 1 to 50.

matt-erhart commented 9 years ago

What's a good way to reject epochs after ICA, perhaps visually or with a custom goodness-metric?

epochs.drop_bad_epochs() does post hoc epoch rejection in the github version now. epochs.drop* has some other useful rejection tools.

jona-sassenhagen commented 9 years ago

30-45 minutes of MEG can sometimes have non-stationary sources, due to movement mostly.

Nonstationarity should be attenuated sufficiently by filtering, and if you want to reduce the impact of non stereotypical artifacts, the proper way would be rejecting just these while preserving as much of the data you want to model as possible.

This is also what I find in the reference literature. E.g.

Two important considerations dictate the quality of ICA decomposition for a given dataset. First, the number of time points of n-channel data used in the decomposition must be sufficient to learn the n^2^ weights of the unmixing matrix. If the numbers of electrodes and independent cortical sources are large, as in typical EEG data, the number of data points used in ICA decomposition should be at least some multiple, k, of n^2^. To decompose large numbers of channels (e.g., 256), k may need to be 20 or larger

Onton et al. 2005

dengemann commented 9 years ago

Sorry for being late here.

  • Does ECG detection need to happen on MAGs or will grads do?

If you fit ICA to both sensors you can just use the MAG for detection. ICA internally computes a cardiac signal from the MAGs if no ECG channel is found. This signal is then compared with the components, e.g. using the default phase-lock analysis.

  • Assuming I must use raw for ECG rejection, what's the best way to prepare/clean this data so that ICA will work as well as possible?

Bandpassfiltering, e.g. between 1 and 45 Hz can improve the results. However how much you want to dilter depends your research question. Other than that it's important not fit ICA to data segments that are artefacted by e.g. external disturbances. For this we have reject parameter in ICA.fit.

Is there a way/function to reject portions of raw data without breaking epoching (.e.g cutting out half a trail).

I think the reject is what you want.

What are the ideal ICA settings for maxmoved data, e.g. for n_PCA_comps?

Do you mean maxfiltered? The number of SSS components, e.g. if the rank of your data is 64 take 64. We wanted already some time ago automatize this. We have both, rank estimators and look up functions in MNE that would allow us to get this information. By hand you can raw.estimate_rank

What exactly is happening with the proj removal bellow and why is this step in the ICA doc but not in any of the ICA examples?

good question. I think it might be outdated. The current version of ICA does not need any special handling of proj as far as I remember.

I've found ICA can work better on smaller chunks of data. Is there a good way to run ICA on portions of the raw object iteratively?

Why would you need this. It's easy of course. You fit once, apply, fit once more, apply.

30-45 minutes of MEG can sometimes have non-stationary sources, due to movement mostly. Even non-movement related strangeness can throw off ICA. Separate ICA runs with smaller amounts of data can have an easier time pulling out components, as long as each run has enough data.

With regard to non-stationarity it is certainly not a bad idea to fit one ICA per run (10 minutes). You can also use all runs and heavily decimate. Look at the decim param. For biological artefacts the effective sampling frequency is much lower. In other words, to decompose MEG recodings into e.g. cardiac signals and brain sources you really don't need every sample.

I hope this helps. FYI take a look here for convenience functions to run ICA on MEG/EEG data. So far tested for 4D and neuromag and under active development. I'm happy share more advances examples on request.

matt-erhart commented 9 years ago

Very helpful. Thanks!