mne-tools / mne-bids-pipeline

Automatically process entire electrophysiological datasets using MNE-Python.
https://mne.tools/mne-bids-pipeline/
BSD 3-Clause "New" or "Revised" License
140 stars 67 forks source link

Manual steps (e.g., selecting epochs for rejection) #924

Open hoechenberger opened 6 months ago

hoechenberger commented 6 months ago

Hello, I would like to specify individual epochs to be excluded from analysis (perhaps even before ICA). I don't think we currently have a mechanism for that. The data I'm currently working on has only few trials and neither autoreject nor global rejection thresholds yield satisfactory results for me, so I've reverted to manual inspection. Now I want to let the pipeline know about which epochs to drop. Any thoughts on this?

cc @SophieHerbst

SophieHerbst commented 6 months ago

@hoechenberger we are currently using mne_bids' interactive inspect_dataset() to remove noisy segments, but this is before epoching. I think it could be useful to add a config parameter for manually removing epochs, provided that the epochs indices never change? The question would be when to do this: before ICA (to remove very noisy epochs and obtain a cleaner ICA), or after ICA to correct single epochs that might have been missed by the other methods, for example in one participant. If we implement it, maybe we should do both, in analogy to the global rejection thresholds pre- and post-ICA?

hoechenberger commented 6 months ago

@hoechenberger we are currently using mne_bids' interactive inspect_dataset() to remove noisy segments, but this is before epoching.

I was thinking of doing the same, but in our experiments, we have quite long ITIs (10, 15 seconds), making this quite an exercise in horizontal scrolling … it's much easier for us to filter & epoch the data, and then visually spot epochs that "pop out"

Our workflow is such that if we discover epochs we deem so corrupted that we don't want to feed them into ICA, we drop them from the dataset for good. So what I'd need is to move the epochs creation step back before ICA (we'd recently moved it past ICA); then I'd need a way to manually inspect the epochs and specify which ones to exclude; and then these should be gone for the rest of the pipeline.

One could add a replica of this step again after ICA to manually remove remaining bad epochs.

I guess I'm just having really bad luck with automated approaches with our data here :(

SophieHerbst commented 6 months ago

😖

I guess I'm just having really bad luck with automated approaches with our data here :(

do we really need to move the epochs creation before ICA or could you inspect the epochs created for ICA? I guess this would mean running the ICA step 2x I have never experienced the need to manually remove epochs before the ICA, I rely on the fixed thresholds there

larsoner commented 6 months ago

So what I'd need is to move the epochs creation step back before ICA (we'd recently moved it past ICA);

Sure but keep in mind that the epochs created for ICA are by design filtered differently from the task epochs. So you'll have to drop for the transient ICA-fitting-epochs and the final/useful task-related epochs.

then I'd need a way to manually inspect the epochs and specify which ones to exclude; and then these should be gone for the rest of the pipeline.

Indeed in general a lot of datasets have spots where some manual or subject-specific "stuff" would be useful. There are also times our stuff gets 90% of the job done but could use some subsequent tweaking. For example, I have datasets where I'd like to add some additional projectors for example -- one to remove motor artifacts and another to remove a vibration artifact -- and it would be great to be able to "inject" them in the SSP steps somewhere. Currently I load the preprocessed Epochs and add them then, and redo source localization etc.

I wonder if there is a generalizable solution. For example, we could allow users to supply custom post-step hooks for this sort of thing that get executed just before writing the out_files. A general framework would be something like:

post_step_hooks: dict[str, callable] = {}

where you map a given step name to a callable. Assuming we split the create-temporary-epochs-for-ICA into an additional step, for you maybe this would be:

def reject_more(*, subject: str, session: str, out_files: dict) -> None:
    ...
    out_files["epochs"].drop(...)

post_step_hooks = {
    'preprocessing/_06a0_ica_epochs.py': reject_more,  # this creates *differently filtered* epochs from the ones that are used later!
    'preprocessing/_07_make_epochs.py': reject_more,  # this creates the final task-related epochs
}

Similarly you could imagine using this sort of thing for subject-specific manual epoch thresholds for example:

reject = None

def reject_specific(*, subject, out_files):
    my_rejects = dict(S01=dict(mag=2000e-15))
    out_files["epochs"].drop_bad(reject=my_rejects.get(subject, dict(mag=3000e-15)))

post_step_hooks = {
    'preprocessing/_09_ptp_reject.py': reject_specific,
}

To me this is very general. As long as the step is deterministic -- and we should tell people it should be -- I think it would work with caching as I think we could pass in the hashed code of the function. (Incidentally we could try this function-hashing business with the custom cov stuff, too --- I think joblib supports doing this sort of thing.) We'd have to figure out the exact semantics but as a general framework it seems very usable to me. Users can even in principle modify out_files to have additional entries if they want eventually (would have to think about it, but would probably work).

We'd of course have to warn people that modifying the outputs could create issues in some circumstances, like if you do out_files["epochs"].crop(...) it's probably going to break downstream steps that assume epochs_tmin and epochs_tmax are always the correct time bounds for epochs. But I think it's safe enough as long as we mention this.