raw.crop doesn't adjust onset of annotations for eyetracking data leading to interpolation errors

schekroud commented 5 months ago

Description of the problem

specifically for eyetracking data (not sure about other types), the raw.crop method will trim the signal down appropriately, but no adjustment is made to the onset time of any annotations that remain in the data. this is particularly problematic when you use mne.preprocessing.eyetracking.interpolate_blinks where it will just interpolate over random segments of the signal

i am not sure if this is an intended behaviour as I think, in the past, the .crop function has adjusted the onset time of events (before annotations were used). If i'm wrong, please send me away!

Steps to reproduce

import mne
from mne.datasets.eyelink import data_path
mne.viz.set_browser_backend('qt')

dpath = data_path() / "eeg-et" / "sub-01_task-plr_eyetrack.asc"
raw = mne.io.read_raw_eyelink(dpath, create_annotations = ['blinks'])
raw.plot(scalings = dict(eyegaze = 1e3))
raw = raw.crop(tmin = 4)
raw = mne.preprocessing.eyetracking.interpolate_blinks(raw, buffer = 0.15)
raw.plot(scalings = dict(eyegaze = 1e3)) #can see interpolation happens exactly 4s after the actual blink period

Link to data

No response

Expected results

it should interpolate the blinks (removing blink artefact)

Actual results

it interpolates exactly Xs after the blink, where X is the tmin applied in the raw.crop call

Additional information

Platform macOS-14.2.1-arm64-arm-64bit Python 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:37:07) [Clang 15.0.7 ] Executable /Users/sammichekroud/anaconda3/envs/mne/bin/python CPU arm (8 cores) Memory 16.0 GB

Core ├☒ mne 1.6.0 (outdated, release 1.6.1 is available!) ├☑ numpy 1.26.2 (OpenBLAS 0.3.25 with 8 threads) ├☑ scipy 1.11.4 ├☑ matplotlib 3.8.2 (backend=Qt5Agg) ├☑ pooch 1.8.0 └☑ jinja2 3.1.2

Numerical (optional) ├☑ sklearn 1.3.2 ├☑ numba 0.58.1 ├☑ nibabel 5.2.0 ├☑ nilearn 0.10.2 ├☑ dipy 1.7.0 ├☑ openmeeg 2.5.7 ├☑ pandas 2.1.4 └☐ unavailable cupy

Visualization (optional) ├☑ pyvista 0.43.1 (OpenGL 4.1 Metal - 88 via Apple M2) ├☑ pyvistaqt 0.11.0 ├☑ vtk 9.2.6 ├☑ qtpy 2.4.1 (PyQt5=5.15.2) ├☑ pyqtgraph 0.13.3 ├☑ mne-qt-browser 0.6.1 ├☑ ipywidgets 8.1.1 ├☑ trame_client 2.14.1 ├☑ trame_server 2.13.1 ├☑ trame_vtk 2.6.3 ├☑ trame_vuetify 2.3.1 └☐ unavailable ipympl

Ecosystem (optional) └☐ unavailable mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline

welcome[bot] commented 5 months ago

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴

schekroud commented 5 months ago

from looking at the source code for raw.crop, it looks like this is because the annotations created by mne.io.read_raw_eyelink come with an orig_time feature. With this feature the onsets of the annotations aren't actually adjusted to be relative to the first sample of the signal

i should say that cropping then adjusting the onset of each annotation will make it so that you can interpolate over the correct area of the signal, but there's then a big mismatch in the actual location of the blink:

# raw.plot(scalings = dict(eyegaze = 1e3))
raw = raw.crop(tmin = 2.0)
#adjust the onset of annotations accordingly
for x in range(len(raw.annotations)):
    raw.annotations.onset[x] -= 2.0

raw = mne.preprocessing.eyetracking.interpolate_blinks(raw, buffer = [0.05,0.2])
raw.plot(scalings = dict(eyegaze = 1e3)) #can see interpolation happens exactly 2s after the actual blink period, in the correct spot

the correct portions of the signal will now be interpolated, but when you actually plot the data you see that the annotations no longer line up with the truth of the signal. Something weird is going on for sure

scott-huberty commented 5 months ago

Thanks for the report!

I think we need to account for raw.first_time in the interpolate_blinks function, which will be non-negative after you call raw.crop:

https://github.com/mne-tools/mne-python/blob/65d826d199c3bbed0ee7035dda0f4560282b43ff/mne/preprocessing/eyetracking/_pupillometry.py#L75-L100

starting at line 89 above, something like:

        for annot in blink_annots:
            if "ch_names" not in annot or not annot["ch_names"]:
                msg = f"Blink annotation missing values for 'ch_names' key: {annot}"
                raise ValueError(msg)
            start = annot["onset"] - raw.first_time - pre_buffer   # critical change here
            end = annot["onset"] - raw.first_time + annot["duration"] + post_buffer  # critical change here

After which, running your code above, should still interpolate the blinks as expected:

Screen Shot 2024-02-01 at 2 45 14 PM

schekroud commented 5 months ago

neat! Thanks for the quick fix - would it be ok to change my pupillometry.py to include that fix until it's released?

also, possibly not the right place to mention this as it's less of an issue and more of a feature request - would it be possible to have this function return the non-interpolated, but still nanned signals? this is actually quite useful when you want to see how much data in a trial is missing (after epoching the raw data, it works well with epoched.plot_image) to aid in trial rejection for later analysis. If useful, i am v happy to be pointed to wherever is good for feature requests to add this!

scott-huberty commented 5 months ago

neat! Thanks for the quick fix - would it be ok to change my pupillometry.py to include that fix until it's released?

@schekroud would you be up to submit a PR with the fix? If we get stuck somewhere (our change breaks existing tests, etc.) I'll be happy to help troubleshoot as well.

also, possibly not the right place to mention this as it's less of an issue and more of a feature request - would it be possible to have this function return the non-interpolated, but still nanned signals?

Like return a numpy array of the signals before interpolation? Sorry if I'm misunderstanding, but feel free to open up a new ticket if its something you want to discuss more!

larsoner commented 5 months ago

I think we need to account for raw.first_time in the interpolate_blinks function, which will be non-negative after you call raw.crop

To fix this rather than any new code I think the solution is to use _annotations_starts_stops

https://github.com/mne-tools/mne-python/blob/main/mne/annotations.py#L1049

Internally it uses _sync_onset to deal with all the first_samp stuff the right way

Like return a numpy array of the signals before interpolation?

If this is all you need raw["eyegaze"] or raw.get_data(...) or similar should get you what you want

scott-huberty commented 5 months ago

To fix this rather than any new code I think the solution is to use _annotations_starts_stops

OK, using _annotations_starts_stops an implementation could look like (again starting from line 89 above):

starts, ends = _annotations_starts_stops(raw, "BAD_blink")  # New line
starts = np.divide(starts, raw.info["sfreq"])    # New line
ends = np.divide(ends, raw.info["sfreq"])   # New line
for annot, start, end in zip(blink_annots, starts, ends):   # Edited line
    if "ch_names" not in annot or not annot["ch_names"]:
       msg = f"Blink annotation missing values for 'ch_names' key: {annot}"
       raise ValueError(msg)
       start -= pre_buffer   # Edited line
       end += post_buffer   # Edited line
       ...

schekroud commented 5 months ago

@scott-huberty i'll submit the PR later on, sure

If this is all you need raw["eyegaze"] or raw.get_data(...) or similar should get you what you want

not quite. i'm talking about adding a channel during mne.io.preprocessing.interpolate_blinks where the channel contents are basically the output of

# signal is your trace
# mask is the np.zeros_like(signal) with boolean for if the sample is to be interpolated
signal[np.squeeze(np.where(mask))] = np.nan
#append signal as new channel called e.g. pupil_nan

i'll open a new issue with feature request to explain a bit more - but it is specifically quite useful for eyetracking to identify 'blinky' subjects that you may want to discard as you're interpolating large %s of data

larsoner commented 5 months ago

i'll open a new issue with feature request to explain a bit more - but it is specifically quite useful for eyetracking to identify 'blinky' subjects that you may want to discard as you're interpolating large %s of data

Sure, feel free. Maybe what we really need is a function to quantify how much of a given raw instance is "covered" by annotations of a specific type. In your case it could be something like mne.annotations.quantify_coverage(raw, "BAD_BLINK") or something similar.

scott-huberty commented 5 months ago

@scott-huberty i'll submit the PR later on, sure

awesome!

Maybe what we really need is a function to quantify how much of a given raw instance is "covered" by annotations of a specific type. In your case it could be something like mne.annotations.quantify_coverage(raw, "BAD_BLINK") or something similar.

if so, maybe we can just repurpose some of the code used for logging in mne.preprocessing.annotate_break into a function:

https://github.com/mne-tools/mne-python/blob/65d826d199c3bbed0ee7035dda0f4560282b43ff/mne/preprocessing/artifact_detection.py#L600-L613

mne-tools / mne-python