Unconsistent number of epochs after cropping of mne.raw

RobinGuillard commented 4 years ago

I think you would guess that these lines give the same output:

1)

raw  = mne.io.read_raw_edf(filename, preload=False)  # prepare loading
raw  = CreateRaw(raw[picks_chan][0], picks_chan, ch_types=['emg'])        # pick channels and load
raw  = raw.load_data()  # load data into memory 

croptimes=dict(tmin=raw.times[0]+3600*2, tmax=raw.times[-1]-3600)
raw.crop(**croptimes)

sfreq = raw.info["sfreq"]
window_length = 0.25                    # in seconds
duration = int(window_length * sfreq)   # in samples
interval = duration                     # no overlapping
epochs = RawToEpochs_sliding(raw, duration=duration, interval=interval)
print(f"Epochs done, shape {epochs.shape}")

REPONSE: Epochs done, shape (115700, 2, 50)

2)

raw  = mne.io.read_raw_edf(filename, preload=False)  # prepare loading

croptimes=dict(tmin=raw.times[0]+3600*2, tmax=raw.times[-1]-3600)
raw.crop(**croptimes)

raw  = CreateRaw(raw[picks_chan][0], picks_chan, ch_types=['emg'])        # pick channels and load
raw  = raw.load_data()  # load data into memory 

sfreq = raw.info["sfreq"]
window_length = 0.25                    # in seconds
duration = int(window_length * sfreq)   # in samples
interval = duration                     # no overlapping
epochs = RawToEpochs_sliding(raw, duration=duration, interval=interval)
print(f"Epochs done, shape {epochs.shape}")

REPONSE: Epochs done, shape (104900, 2, 50)

Odd isn't? If you know why, don't hesitate to tell me!

lkorczowski commented 4 years ago

First,

raw = raw.load_data() # load data into memory line seems useless because, raw = CreateRaw(raw[picks_chan][0], picks_chan, ch_types=['emg']) # pick channels and load actual load the data.

The probable explanation is that raw.times is important. When we use CreateRaw, I believe that raw.times is reset.

Therefore raw.times is not the same in the first and second example.

How could it explain such big different ?

option 1: Sample rate is not perfectly fitted to the real time samples.

first example use probably the time of the software recording (not the sampling rate) to annotate time stamps
second example assume that the sample_rate*samples = time passed which seems obviously not true here

Conclusion:

sampling rate (sfreq) is false, we should trust time stamps (raw.times) instead. If we need sfreq we could estimate it.

option 2: the are missing recording (holes)

the recording device didnt record all the night. Missing values are therefore all lot of missing timestamps and in the end missing epochs

RobinGuillard commented 4 years ago

I think it is most probably the second option: in fact, when I downloaded the datas on edf format, the download occured on the "period off interest" (sleep onset to awakening) whereas the whole recording was generally longer (we plugged the device before the dinner most of the time).

lkorczowski commented 4 years ago

I don't know, it should be investigated properly, to do so:

compute the inter-sample duration. Plot the distribution.

Cases:

distribution is a dirac: perfectly stable sampling rate (it should be any difference between the two examples)
the distribution is a gaussian with an average slightly different that sfreq=200Hz (Option 1: true)
the distribution has some outlier (i.e. missing chunks with long delay) (option 2)

Actually both option 1 and option 2 could be right.

RobinGuillard commented 4 years ago

Maybe solved:

1) the sample rate is 250 Hz (or almost) and not 200 Hz which explains the initial issue presented here. 2) when aligning for sleep stages, epochs for sleep staging in the CSV are always starting at "round hours" (ie : 22h50:00, then 22h50:30... and so on) whereas recordings start at random hours ( i.e. 22h50:14 for example). This difference observed accounts for a delta between the start of sleep stage annotations and the recording. What should we do? We should start the sleep staging annotation process at the date "hour of first epoch in CSV" - "hour of first sample in recording (in raw.info["meas_date"]). For the first partial epoch, we should put the same sleep stage as in the first epoch of the csv file (most probably, it will be "awake").

lkorczowski / Tinnitus-n-Sleep