Index bug when annotating segments in data browser

richardkoehler commented 1 year ago

Description of the problem

Description

When annotating bad segments in the QtBrowser, we noticed that if the bad segment includes the very end of the recording, the last sample of the recording is not discarded, when running: raw, times = raw.get_data(reject_by_annotation="omit", return_times=True)

On the rest of the bad segment it works fine, including the second to last sample (see code below) - it is simply the last sample that does not seem to be discarded. You can observe when plotting the raw data afterwards:

grafik

and when inspecting the last two samples of the times:

grafik

I included a code sample, but to reproduce the bug yourself, you have to manually shift the annotation in the raw browser so that it slightly "depasses" the end of the recording.

Proposal

I suspect that this might be related to indexing in a half-open interval (i.e. the last sample of the annotated segment in the data browser is not included in the raw.annotation, as in numpy.arange etc.). If this is the case, it would not be possible to annotate the last sample of the recording, which I would consider a bug worth fixing, even if it is obviously an edge case.

Discussion

Let me know if you think this would be desirable to fix. If you point me towards where in the source code the annotations in the data browser are translated to the annotations in the raw object, I can implement the changes myself, but unfortunately the plotting code is rather complex so I wasn't able to find the right spot in the source code myself.

Steps to reproduce

from matplotlib import pyplot as plt
from mne_bids import BIDSPath, read_raw_bids
import mne

bids_root = mne.datasets.epilepsy_ecog.data_path()

bids_path = BIDSPath(
    root=bids_root, 
    subject='pt1', 
    session='presurgery',
    task='ictal', 
    datatype='ieeg', 
    extension='.vhdr'
    )

raw: mne.io.Raw = read_raw_bids(bids_path=bids_path, verbose=False)

raw.load_data(verbose=False)

raw.set_annotations(None)

print(f"\n{raw.annotations = }")

last_time = raw.times[-1]

print(f"\nLast time point raw data: {last_time}")

data_bef, times_bef = raw.get_data(
    picks=None, start=0, stop=None, reject_by_annotation="omit", return_times=True
)

print(f"\nLast time point before: {times_bef[-1]}")
print(f"\n{len(times_bef) = }")

annotations = mne.Annotations(
    onset=[last_time - 0.999], duration=[1.1], description=["BAD_SEGMENT"],
    orig_time= raw.info['meas_date'], ch_names=[raw.info["ch_names"]]
    )

raw.set_annotations(annotations)

# Uncomment this to confirm the problem only occurs when manually shifting
# the annotation in a browser and not systematically when the annotation
# exceeds the recording duration
# raw._annotations = annotations

raw.plot(scalings="auto", block=True, verbose=True)

print(f"\n{raw.annotations = }")

data_aft, times_aft = raw.get_data(
    picks=None, start=0, stop=None, reject_by_annotation="omit", return_times=True
)

print(f"\nLast time point after: {times_aft[-1]}")
print(f"\n2nd to last time point after: {times_aft[-2]}")
print(f"\n{len(times_aft) = }")

plt.plot(times_aft[-1000:], data_aft[0,-1000:])
plt.show(block=True)

Link to data

No response

Expected results

Last time point before: 269.079

Last time point after: **268.079**

2nd to last time point after: 268.078

Actual results

Last time point before: 269.079

Last time point after: **269.079**

2nd to last time point after: 268.078

Additional information

Platform: Windows-10-10.0.22621-SP0 Python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:16:33) [MSC v.1929 64 bit (AMD64)] Executable: C:..\anaconda3\envs\mnedev\python.exe
CPU: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel: 8 cores Memory: 31.8 GB

mne: 1.3.dev0 numpy: 1.23.5 {unknown linalg bindings} scipy: 1.9.3 matplotlib: 3.6.2 {backend=QtAgg}

sklearn: 1.2.0 numba: 0.56.4 nibabel: 4.0.2 nilearn: 0.9.2 dipy: 1.5.0 openmeeg: 2.5.5 cupy: Not found pandas: 1.5.2 pyvista: 0.37.0 {OpenGL 4.5.0 - Build 31.0.101.2114 via Intel(R) UHD Graphics} pyvistaqt: 0.9.0 ipyvtklink: 0.2.2 vtk: 9.2.2 qtpy: 2.3.0 {PyQt5=5.15.6} ipympl: 0.9.2 pyqtgraph: 0.13.1 pooch: v1.6.0

mne_bids: 0.12 mne_nirs: Not found mne_features: Not found mne_qt_browser: 0.4.0 mne_connectivity: Not found mne_icalabel: Not found

hoechenberger commented 1 year ago

Hello @richardkoehler, do you observe the same problem when using the Matplotlib-based data browser instead?

richardkoehler commented 1 year ago

@hoechenberger No, it seems not - I wasn't able to reproduce the issue with the matplotlib-based browser.

agramfort commented 1 year ago

@richardkoehler sorry it took me so long to look. Yes I can replicate the problem. Basically in the QT browser you cannot mark the last sample as bad. The annotation excludes the last sample even if you drag to the far right.

richardkoehler commented 1 year ago

@agramfort No worries - thanks for verifying this! I am assuming this might be rather easy to fix? I had briefly looked at the source code but I was not able to pinpoint the piece of code where the annotations are applied. If you let me know where I have too look precisely, I am very happy to help obviously!

marsipu commented 1 year ago

@richardkoehler Thank you for pointing out this issue. You might want to have a look here. Probably there lies a problem that the annotation is one short at the end. It might also have something to do with setting the bounds of the AnnotationRegion to xmax (which is raw.times[-1]). Hope that helps, couldn't test for myself at the moment.

richardkoehler commented 2 weeks ago

@marsipu Sorry, I took quite a while to be able to look into this but I think I was able to track down the issue.

It seems that the issue is actually in the behavior of the parameter reject_by_annotation in the raw.get_data function. Basically when you opt to reject by annotation, the last sample of the annotation (annotation["onset"] + annotation["duration"]) is not rejected. I don't think this is necessarily an unintended bug, but rather the intended behavior.

The reason I wrongly blamed mne-qt-browser is that in fact the "bug" is rather happening in the matplotlib based browser, where it is possible to annotate beyond the actual end of the recording by dragging the annotation to the far right. This is why when you drag it to the far right in matplotlib, the annotation goes beyond the recording end and the last sample is also rejected, but not when you annotate the true last sample in the QT browser, where the last sample of the annotation is the last sample of the recording. I hope this makes sense.

Now the question is whether we have to address the fact that the last sample of the "BAD_" annotation is not rejected in MNE python itself. I would argue that this is not the intuitive behavior, since when I annotate a segment as bad, I would expect the entire segment to be discarded and not the last sample to be kept.

I am grateful for any opinions on this.

mscheltienne commented 2 weeks ago

It seems that the issue is actually in the behavior of the parameter reject_by_annotation in the raw.get_data function. Basically when you opt to reject by annotation, the last sample of the annotation (annotation["onset"] + annotation["duration"]) is not rejected. I don't think this is necessarily an unintended bug, but rather the intended behavior.

I would consider this a bug.

richardkoehler commented 2 weeks ago

@mscheltienne Okay, if you would like for me to look into fixing this, maybe we can move the issue back to the mne-python repo? Or would you like me to open a new issue there?

mscheltienne commented 2 weeks ago

Done, a PR would be most welcome!

mne-tools / mne-python