ENH: Split ICA into three steps

SophieHerbst commented 5 months ago

I wanted to re-run the apply_ICA step after changing the thresholds for automatic component rejection in my config.

I had three surprises/ observations:

mismatch between documentation website and actual nme of the step: 07 versus 08
the output prints each run (raw), while I expected the ICA being applied to epochs given the order in the pipeline
nothing was computed (Computation unnecessary (cached) …)

@hoechenberger how do I rerun this step? If the other points are just a matter of changing the doc, I can do it with a few how-to pointers from you

(versions: latest main of mne, pipeline, and mne-bids)

(mne) sh254795@is152869:/neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized$ mne_bids_pipeline --config=/neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/code/config_sophie.py --steps=preprocessing/apply_ica
┌────────┬ Welcome aboard MNE-BIDS-Pipeline! 👋 ────────────────────────────────────────────────────────────────
│13:51:23│ 📝 Using configuration: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/code/config_sophie.py
└────────┴ 
┌────────┬ init/_01_init_derivatives_dir ───────────────────────────────────────────────────────────────────────
│13:51:23│ ✅ Output directories already exist …
└────────┴ done (6s)
┌────────┬ init/_02_find_empty_room ────────────────────────────────────────────────────────────────────────────
│13:51:29│ ✅ sub-155 run-01 Computation unnecessary (cached) …
└────────┴ done (2s)
┌────────┬ preprocessing/_08a_apply_ica ────────────────────────────────────────────────────────────────────────
│13:51:36│ ✅ sub-155 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-04 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-05 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-06 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-07 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-08 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-rest Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-noise Computation unnecessary (cached) …
│13:51:41│ ✅ sub-155 run-01 Computation unnecessary (cached) …
│13:51:41│ ✅ sub-155 run-03 Computation unnecessary (cached) …
│13:51:41│ ✅ sub-155 run-02 Computation unnecessary (cached) …
└────────┴ done (12s)

larsoner commented 5 months ago

This first cached line is for the epochs

┌────────┬ preprocessing/_08a_apply_ica ────────────────────────────────────────────────────────────────────────
│13:51:36│ ✅ sub-155 Computation unnecessary (cached) …

The thresholds affect fitting ICA, so you need to rerun that step and the applying step

SophieHerbst commented 5 months ago

Thanks for the reply @larsoner. I don't understand why the thresholds affect the fitting. I meant those parameters: ica_ctps_ecg_threshold = 0.05 ica_eog_threshold = 2.0

hoechenberger commented 5 months ago

The thresholds affect fitting ICA, so you need to rerun that step and the applying step

Isn't that a bug then? Shouldn't we auto-rerun ICA fitting if config values related to the ICA step change?

I don't understand why the thresholds affect the fitting.

They don't, but fitting and ExG artifact detection are performed in the same step (run-ica), hence that step needs to be re-run. Maybe it should be split into two steps, a) fitting and b) artifact detection

SophieHerbst commented 5 months ago

I thought that the whole point of having two separate files for run and apply ICA was to run the ICA once and be able to adjust the thresholds for eog/ecg rejection without recomputing ICA

larsoner commented 5 months ago

SophieHerbst commented 5 months ago

that is how it was in 'the old pipeline': https://github.com/brainthemind/CogBrainDyn_MEG_Pipeline/blob/master/06a-apply_ica.py Did it get lost?

larsoner commented 5 months ago

Isn't that a bug then? Shouldn't we auto-rerun ICA fitting if config values related to the ICA step change?

To detect this in the general case this would require traversing the whole processing tree. In other words, if a user selects a single step, don't treat it like "just run this" but instead like a stopping step, "run everything up to this". Like if a user changed their filter settings but only asks to run source estimation, quite a few preceding steps actually need to be rerun.

hoechenberger commented 5 months ago

I thought that the whole point of having two separate files for run and apply ICA was to run the ICA once and be able to adjust the thresholds for eog/ecg rejection without recomputing ICA

The report generated through run-ica is supposed to help you determine which components to remove in the next step (apply-ica)

To assist you, we run the automated artifact detection

The results end up together with all components in the ica+components report, e.g. http://mne.tools/mne-bids-pipeline/stable/examples/ERP_CORE/sub-015_ses-ERN_task-ERN_proc-ica+components_report.html

The apply-ica step then looks into the ICA component CSV file to figure which components you would like removed

Like I said above, I think your point of separating the artifact detection step from the fitting step is fair. I think we should do, in three steps:

fit ICA
detect artifacts
apply ICA

so you could fiddle with artifact detection settings (step 2) without having to re-fit ICA (step 1)

hoechenberger commented 5 months ago

To detect this in the general case this would require traversing the whole processing tree. In other words, if a user selects a single step, don't treat it like "just run this" but instead like a stopping step, "run everything up to this".

So would would have been a viable solution for @SophieHerbst now? Simply re-run the entire preprocessing pipeline and rely on caching to skip e.g. filtering etc?

hoechenberger commented 5 months ago

So would would have been a viable solution for @SophieHerbst now? Simply re-run the entire preprocessing pipeline and rely on caching to skip e.g. filtering etc?

We don't have a "please run everything up to this step" functionality right now, do we? Do you think we could implement this somehow?

SophieHerbst commented 5 months ago

And why is the run-ica step before make_epochs? As I see from the code it creates epochs and runs ICA on them.

hoechenberger commented 5 months ago

And why is the run-ica step before make_epochs? As I see from the code it creates epochs and runs ICA on them.

ICA creates an entirely separate set of epochs to allow for applying different filters before fitting (e.g., 1 Hz high-pass for ICA while you'll maybe want 0.1 Hz high-pass for your "actual" data)

SophieHerbst commented 5 months ago

I was just confused to see that it appears before epochs in the steps and that the output talks about the raw runs. But I get it now.

hoechenberger commented 5 months ago

I was just confused to see that it appears before epochs in the steps and that the output talks about the raw runs. But I get it now.

Yes so it iterates over the raw files in order to create epochs

it's actually a good sign this caught your attention – Eric just recently changed the order of things to what it is now to highlight the fact that ICA is not fitted to your "regular" epochs

but it appears this needs better documentation and / or logging output

SophieHerbst commented 5 months ago

but it appears this needs better documentation and / or logging output

And the steps in the documentation need to be renamed accordingly

larsoner commented 5 months ago

Like I said above, I think your point of separating the artifact detection step from the fitting step is fair. I think we should do, in three steps:

Yes this is doable. We could probably consider renaming the steps a bit so we don't have to keep renaming the steps that follow if possible (like with 05a1, 05a2 or something instead of 05a, 06a but only a 05a for SSP and no 06a at all)

I was just confused to see that it appears before epochs in the steps and that the output talks about the raw runs. But I get it now.

Yes so it iterates over the raw files in order to create epochs... it's actually a good sign this caught your attention...

Yes this is exactly why I reordered it, I initially also thought ICA was fit on the task epochs but it's not :slightly_smiling_face:

So would would have been a viable solution for @SophieHerbst now? Simply re-run the entire preprocessing pipeline and rely on caching to skip e.g. filtering etc?

Yes personally nowadays I always just mne_bids_pipeline config.py anytime I change anything and rely on caching to skip stuff. Data on SSDs this only has ~10s overhead for ~20 subjects data. Maybe it's not as viable on NFS-stored data or something -- @SophieHerbst can you give some information on how long it takes mne_bids_pipeline config.py to run on your data without changing anything, i.e., to be a no-op that just says "cached" for every step? Maybe the mtime checks are expensive or something.

We don't have a "please run everything up to this step" functionality right now, do we? Do you think we could implement this somehow?

Yeah it would be possible in principle to implement this, I'll put this in a new issue

And the steps in the documentation need to be renamed accordingly

@SophieHerbst what would you like to work on, if anything? Looks like we have

Update docs
Reorder/reconfigure ICA steps as per https://github.com/mne-tools/mne-bids-pipeline/issues/857#issuecomment-1961323394

I'm happy to do any of these

SophieHerbst commented 5 months ago

@larsoner I am happy to give both a try but likely will need some supervision to get it right. Also, I might not get to it before next week, so if you have time to start go ahead. For the documentation, can you point me to how the website is updated?

larsoner commented 5 months ago

Docs are all built from https://github.com/mne-tools/mne-bids-pipeline/tree/main/docs/source and also the contents of https://github.com/mne-tools/mne-bids-pipeline/blob/main/mne_bids_pipeline/_config.py , so depends on what you need to update. I would use git grep with a few words from the offending text to find the right file

larsoner commented 5 months ago

Upon reflecting a bit... I think we should probably change both docs/source/features/steps.md and docs/source/features/overview.md to be generated using Python code. We can use the module __doc__ (ideally the title, but could be some other private var). @SophieHerbst okay for me to do this? That way you don't have to manually update anything and we'll automatically stay up-to-date...

larsoner commented 5 months ago

Hah, looks like we already do that with steps.md. So we need to do it with features.md as well...

larsoner commented 5 months ago

Since the doc part is in #860 now (I hope), I re-titled this issue to be about splitting ICA into three steps:

ICA.fit (which is done on task epochs, but they're separate from the ones eventually used because filtering differs)
Artifact identification
ICA.apply

hoechenberger commented 5 months ago

ICA.fit (which is done on task epochs, but they're separate from the ones eventually used because filtering differs)

yes this is what I meant to say, sorry if there was any confusion!

SophieHerbst commented 5 months ago

@hoechenberger @larsoner do you think a new release could be done with the changes in ICA? Then I could run with the stable version once more, hand over to the student, and hopefully not touch preprocessing for this dataset again.

larsoner commented 5 months ago

I'd like to get #863 in, then I don't see why not :+1:

hoechenberger commented 5 months ago

somebody would need to implement the "ICA splitting" first 😅

larsoner commented 5 months ago

I can tackle that this week I think, I don't think it'll be too bad

SophieHerbst commented 5 months ago

sorry, I didn't have the time to contribute to this. let me know I can test anything

mne-tools / mne-bids-pipeline

ENH: Split ICA into three steps #857