Annotation and crop - different onset points after reloading the saved croppend data

kolcs commented 3 years ago

Describe the bug

I suspect that this issue is similar to the #9383. The issue only occurs when a raw instance created with Annotation. Originally labeled data as brain vision files are work fine. Cropping the instance, saving and reloading it shows the annotations in the original place before the cropping.

Steps to reproduce

import numpy as np
import mne
CH_NUM = 32
FS = 160

data = np.zeros((CH_NUM, FS * 20))
ch_names = [f'EEG{i}' for i in range(CH_NUM)]
ch_types = ['eeg'] * len(ch_names)
onset = [8, 12, 18]
duration = [2] * len(onset)
description = list(range(len(onset)))

info = mne.create_info(ch_names, ch_types=ch_types, sfreq=FS)
raw = mne.io.RawArray(data, info)
annotation = mne.Annotations(onset, duration, description)
raw = raw.set_annotations(annotation)

sess = raw.copy()
sess.crop(5, 18)
file = 'test_file_raw.fif'
sess.save(file, overwrite=True)
sess.plot(block=False)
r = mne.io.read_raw(str(file))
r.plot(block=False)
raw.plot(block=True)

Expected results

I expect that the annotated data should look like the same before and after saving it.

Actual results

The annotations are shifted...

Additional information

Platform: Windows-10-10.0.19041-SP0 Python: 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] Executable: C:\Programs\Miniconda3\envs\bci2\python.exe CPU: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel: 12 cores Memory: Unavailable (requires "psutil" package) mne: 0.23.0 numpy: 1.19.3 {blas=D:\a\1\s\numpy\build\openblas_info, lapack=D:\a\1\s\numpy\build\openblas_lapack_info} scipy: 1.6.0 matplotlib: 3.3.3 {backend=TkAgg} sklearn: 0.24.0 numba: Not found nibabel: Not found nilearn: Not found dipy: Not found cupy: Not found pandas: 1.2.0 mayavi: Not found pyvista: Not found vtk: Not found

kolcs commented 3 years ago

bug

agramfort commented 3 years ago

I confirm the bug... you can fix it if you set a meas_date with the raw eg with:

import numpy as np
import mne
from datetime import datetime, timezone

CH_NUM = 32
FS = 160

data = np.zeros((CH_NUM, FS * 20))
ch_names = [f'EEG{i}' for i in range(CH_NUM)]
ch_types = ['eeg'] * len(ch_names)
onset = [8, 12, 18]
duration = [2] * len(onset)
description = list(range(len(onset)))

info = mne.create_info(ch_names, ch_types=ch_types, sfreq=FS)
raw = mne.io.RawArray(data, info)
annotation = mne.Annotations(onset, duration, description)
raw = raw.set_annotations(annotation)

raw.set_meas_date(datetime.now(tz=timezone.utc))

sess = raw.copy()
sess.crop(5, 18)
file = 'test_file_raw.fif'
sess.save(file, overwrite=True)
sess.plot(block=False)
r = mne.io.read_raw(str(file))
r.plot(block=False)
raw.plot(block=False)

agramfort commented 3 years ago

I think this bug has been reported in the past and I suggested a temporary fix which is to set a meas_date so it's not None....

adam2392 commented 3 years ago

Hi just bumping this issue because it came up today for us @pmyers16

Do we perceive this to be an easy issue to solve?

Why can't we just modify the meas_date in crop() if it is set?

mscheltienne commented 3 years ago

@adam2392 In the post I deleted and on which @agramfort reacted, I was trying to understand how to correctly solve this issue, as I also ran into it. In crop, there is basically an if/else statement that checks if you have a meas_date set or not. If you have one set, the statement is correct. If you don't it enters the other statement which creates the issue. No idea yet how to properly solve this as I don't know enough about the intention behind the first_samp attribute (I hope I remember the name of this attribute correctly) which stores the time of the first sample? I don't get why it might be different from 0 and I don't get to what the annotations onsets are related to (i.e. what is the origin).

drammock commented 3 years ago

I don't get why first_samp might be different from 0

This is a "feature" of Neuromag systems, where there is a distinction between "starting the acquisition system" and "starting the recording"

and I don't get to what the annotations onsets are related to (i.e. what is the origin).

the origin is the beginning of the raw object instance, regardless of what it's first_samp value is.

mscheltienne commented 3 years ago

the origin is the beginning of the raw object instance, regardless of what its first_sampvalue is.

So would that line be the culprit? annotation setter line 693 in mne.io.Base.py

new_annotations.onset += self._first_time

agramfort commented 3 years ago

give a try to a PR and see if this breaks any test. If not and it makes the new test case pass you win !

mscheltienne commented 3 years ago

@agramfort If I'm not opening a PR yet, it's because I know this is not the only culprit, and I can't really figure out what else is messing up the annotations. Let's take your example, with slightly renamed variables.

from datetime import datetime, timezone

import numpy as np
import mne

CH_NUM = 32
FS = 160

data = np.zeros((CH_NUM, FS * 20))
ch_names = [f'EEG{i+1}' for i in range(CH_NUM)]
ch_types = ['eeg'] * len(ch_names)
onset = [8, 12, 18]
duration = [2] * len(onset)
description = list(range(len(onset)))

info = mne.create_info(ch_names, ch_types=ch_types, sfreq=FS)
raw = mne.io.RawArray(data, info)
annotation = mne.Annotations(onset, duration, description)
raw = raw.set_annotations(annotation)

# raw.set_meas_date(datetime.now(tz=timezone.utc))

raw_copied = raw.copy()
raw_copied.crop(5, 18)
file = 'test_file_raw.fif'
raw_copied.save(file, overwrite=True)
raw_loaded = mne.io.read_raw(str(file))

raw.plot(block=False)
raw_copied.plot(block=False)
raw_loaded.plot(block=False)

If the measurement date is set, with the current code we get:

raw.annotations.onset
Out[2]: array([ 8., 12., 18.])

raw_copied.annotations.onset
Out[3]: array([ 8., 12., 18.])

raw_loaded.annotations.onset
Out[4]: array([ 8., 12., 18.])

@drammock Already, this does not make sense to me. If the annotations should be relative to the beginning of the raw object instance, then the timings for raw_copied and raw_loaded should be -5.

Without the measurement date, with the current code we get:

raw.annotations.onset
Out[6]: array([ 8., 12., 18.])

raw_copied.annotations.onset
Out[7]: array([ 8., 12., 18.])

raw_loaded.annotations.onset
Out[8]: array([13., 17.])

With a wrong plot for raw_loaded. That tells me that either raw.save() or mne.io.read_raw runs the annotations setter, adding a second time the variable self._first_time (5 seconds).

And finally, if we comment the line 693 in mne.io.base.py and thus don't add self._first_time:

raw.annotations.onset
Out[10]: array([ 8., 12., 18.])

raw_copied.annotations.onset
Out[11]: array([ 3.,  7., 13.])

raw_loaded.annotations.onset
Out[12]: array([ 3.,  7., 13.])

This is actually making sense to me. Both raw_copied and raw_loaded have the same onset, relative to the beginning of the instance. The durations are also similar and make sense:

raw.annotations.duration
Out[14]: array([2., 2., 2.])

raw_copied.annotations.duration
Out[15]: array([2.     , 2.     , 0.00625])

raw_loaded.annotations.duration
Out[16]: array([2.        , 2.        , 0.00625038])

However, this is the plot obtained (for both raw_copied and raw_loaded):

Screenshot 2021-09-02 at 10 42 21

I do believe this line 693 in mne.io.base.py is a mistake and should be removed, but something else is messing up the annotations. Probably something in the plot which is using the first_time which would explain why the plot shows the 2 last events of duration 2 and 0.00625038?

mscheltienne commented 3 years ago

A few words following discord discussion. The assumption is that:

Annotation onset should be relative to the beginning of the data reflected by first_samp. -> Different from @drammock comment: the origin is the beginning of the raw object instance, regardless of what its first_samp value is.
With this assumption, the problem would be limited to the I/O roundtrip.

I'll have a look when I have time.

larsoner commented 3 years ago

@mscheltienne I'm inclined to bump the milestone on this from 0.24 to 1.0 so we don't have to rush a fix in the next week that will potentially break other stuff.

mne-tools / mne-python