No mechanism for channel specific epoch removal

Aaronearlerichardson commented 1 year ago

Describe the new feature or enhancement

One step in my preprocessing workflow is to take the epoch my data, and then remove trial outliers for each channel, defined as epochs where there is a voltage value more than 10 standard deviations above the channel average at any time point. The outliers in this calculation are channel specific, but the epochs.drop method can only remove epochs for all channels.

Describe your proposed implementation

A new info field called 'outliers'. The field stores a boolean array of shape (channels X epochs) indicating whether a given epoch passes a given outlier test.

This field will be checked when creating an evoked object using epoch.average()

Describe possible alternatives

Include outlier function as a 'method' for the epochs averaging function. This does require that the outliers be recalculated at the time of averaging. If you want to determine the outliers for a broadband signal, passband filter that signal, and then find an average evoked response later this method won't work. additionally, it doesn't allow storage of the outlier mask in a epo.fif file.

Additional context

Outliers determination method:

def find_outliers(data: np.ndarray, outliers: float) -> np.ndarray[bool]:
    dat = np.abs(data)  # (trials X channels X (frequency) X time)
    max = np.max(dat, axis=-1)  # (trials X channels X (frequency))
    std = np.std(dat, axis=(-1, 0))  # (channels X (frequency))
    mean = np.mean(dat, axis=(-1, 0))  # (channels X (frequency))
    keep = max < ((outliers * std) + mean)  # (trials X channels X (frequency))
    return keep

larsoner commented 1 year ago

A new info field called 'outliers'.

This part at least won't work. info should really be static and have to do with the acquisition setup itself. Characteristics of the recorded data values (peak to peak, flatness, etc.) shouldn't live there. In this sense it's a bit weird that info['bads'] exists but we should try to avoid making the problem worse :)

In principle we can add attributes like epochs.whatever (whose scope makes more sense than info) but it requires a bit of work and thinking as well.

Include outlier function as a 'method' for the epochs averaging function. This does require that the outliers be recalculated at the time of averaging.

Other ideas:

Some variant of the above could potentially work, but I'm thinking of a boolean epochs.average(..., mask=...) where mask is a bool array of shappe (n_epochs, n_channels).u
An easier way to accomplish the same thing might be an epochs.interpolate_bads(..., method='nan') (which doesn't really do interpolation so a bit of a misnomer) which just takes any bad channels (can be epoch-specific) and sets their data to NaN, then we make sure epochs.average(...) uses np.nanmean under the hood.

One consideration with any such method is that at the end of this operation the data will have a (potentially drastically) different number of epochs included in each channel's average. This will make subsequent interpretation harder in sensor space, and for source space things like dSPM and sLORETA which scale with sqrt(nave) won't be correct anymore. Maybe we can live with these caveats but I'm not sure. So is it a widely used or accepted use case to allow different numbers of epochs averaged per channel? Maybe you need it for clinical data or something?

Aaronearlerichardson commented 1 year ago

info should really be static and have to do with the acquisition setup itself.

Ah I see. Adding as an attribute of the Epochs object works so long as it is saved in the epo.fif file.

So is it a widely used or accepted use case to allow different numbers of epochs averaged per channel? Maybe you need it for clinical data or something?

Channel specific trial outlier removal is typical in intercranial/clinical work. In general with non-invasive methods (e.g. EEG/MEG) noise sources come from outside and affect the channels generally equally (channels are highly correlated). With intracranial data, this is not the case. Information is specific and noise sources can be as well. For example, in a speech production task one electrode might get jostled because of head or jaw movement during one trial, but that doesn't necessarily disqualify all the trials for that channel nor all the channels for that trial

drammock commented 1 year ago

Channel specific trial outlier removal is typical in intercranial/clinical work.

a use case we want to support, so we should try to find a way.

One consideration with any such method is that at the end of this operation the data will have a (potentially drastically) different number of epochs included in each channel's average. This will make subsequent interpretation harder in sensor space, and for source space things like dSPM and sLORETA which scale with sqrt(nave) won't be correct anymore.

This seems at least in principle solvable? I'm imagining something like "effective n_ave" analogous to "effective/estimated degrees of freedom" as is often done in AOV / linear modeling.

marking as NaN and using nanmean / interpolation seems like the right approach to me.

CarinaFo commented 1 year ago

I would like to work on this

mne-tools / mne-python