Incorrect values produced when epoching a continuous discrete signal using MNE epoch function

balandongiv commented 4 years ago

The idea was to epoch the continuous EEG data of 386.936 s long into non overlapping epoch window, of size 20 s. With a sampling frequency 250 Hz, theoretically each epochs should contain 5000 data points per epoch.

To achieve the objective, the following code was utilised,

epochs = mne.Epochs(raw, events=events, event_id=event_id, baseline=None, verbose=True) MneApproach=epochs.to_data_frame()

To confirm whether the value return from the mne.Epoch was correct or not, I had created a script that can performed the epoching manually. The output from the script has been validated visually and was working as intended. However, I noticed there were different between the script output and the value from dataframe MneApproach. Apart from different values, the MneApproach contained only 176 data points per epoch; which in my opinion is incorrect as the number of data points should be 5000 data points

May I know what did I do wrong while inputting the mne.Epoch function.

The above problem can be reproduced from the following ipynb via Google Colab or from the code below

Really appreciate for any feedback and help.

import urllib.request

print('Beginning file download with urllib2...')

url = 'http://bnci-horizon-2020.eu/database/data-sets/001-2014/A01T.mat'
urllib.request.urlretrieve(url, '/content/A01T.mat')

mat = loadmat("/content/A01T.mat")
# eeg = mat["data"][0, 3]["X"][0, 0] * 10e-6
eeg = mat["data"][0, 3]["X"][0, 0]
ch_names = ["Fz", "FC3", "FC1", "FCz", "FC2", "FC4", "C5", "C3", "C1", "Cz",
            "C2", "C4", "C6", "CP3", "CP1", "CPz", "CP2", "CP4", "P1", "Pz",
            "P2", "POz", "EOG1", "EOG2", "EOG3"]
info = mne.create_info(ch_names, 250, ch_types=["eeg"] * 22 + ["eog"] * 3)
raw = mne.io.RawArray(eeg.T, info)

raw.set_montage("standard_1020")

Manual Approach

raw_b = raw.copy()

values = raw_b.get_data()
# values = numpy.zeros(20, dtype=dtype)
index = ['Row'+str(i) for i in range(1, len(values)+1)]

df = pd.DataFrame(values, index=index).transpose() 
df.columns = raw_b.ch_names[:]
# df.head()
event_id = 1  # This is used to identify the events.
duration = 20. # Unit in second

# create a fixed size events array
# start=0 and stop=None by default
events = mne.make_fixed_length_events(raw_b, event_id,duration=duration)
print(events)

NoEpochs=len(events)-1
Epoch = range(0, len(events)-1, 1)

NameEpochs = pd.DataFrame(NoEpochs*(np.ones((df.shape[0], 1))), columns=['Epochs'])

for f in Epoch:
 NameEpochs.Epochs.iloc[events[f][0]:events[f+1][0]] = Epoch[f]

df = pd.concat([df,  NameEpochs], axis = 1)
df.reset_index(drop=True, inplace=True)

df.set_index([df.Epochs, df.index], inplace=True)
del df['Epochs'] ## Delete extra column
df.loc[0.0] # Extrach only epoch 0

Snap 2019-10-27 at 12 35 50 Values for the first epoch using manual approach

MNE approach

epochs = mne.Epochs(raw, events=events, event_id=event_id,  baseline=None, verbose=True)
MneApproach=epochs.to_data_frame()
MneApproach
MneApproach.xs(0, level='epoch') # Extrach epoch 0

Snap 2019-10-27 at 12 36 10 Values for the first epoch using MNE approach

drammock commented 4 years ago

Please do not use the bug tracker for usage questions. Your question was already answered on the mailing list --- mne.Epochs has arguments tmin and tmax that you need to specify to get the epoch duration that you want. It is not enough just to space the events 20 seconds apart. You also need to specify how much time before and after each event that you want to include in the epoch.

balandongiv commented 4 years ago

Hi @drammock , Thanks for both the response here and in the mailing list. I just notice your explanation there.

drammock commented 4 years ago

(mnedev) [~/Desktop] $ ipython
In [1]: import os 
   ...: import urllib.request 
   ...: import mne 
   ...: from scipy.io import loadmat 
   ...:  
   ...: # download data 
   ...: url = 'http://bnci-horizon-2020.eu/database/data-sets/001-2014/A01T.mat' 
   ...: urllib.request.urlretrieve(url, 'A01T.mat') 
   ...:                                                                                                                                   
Out[1]: ('A01T.mat', <http.client.HTTPMessage at 0x7f46a532f358>)
In [2]: # make Raw 
   ...: mat = loadmat("A01T.mat") 
   ...: eeg = mat["data"][0, 3]["X"][0, 0] 
   ...: ch_names = ["Fz", "FC3", "FC1", "FCz", "FC2", "FC4", "C5", "C3", "C1", "Cz", 
   ...:             "C2", "C4", "C6", "CP3", "CP1", "CPz", "CP2", "CP4", "P1", "Pz", 
   ...:             "P2", "POz", "EOG1", "EOG2", "EOG3"] 
   ...: info = mne.create_info(ch_names, 250, ch_types=["eeg"] * 22 + ["eog"] * 3) 
   ...: raw = mne.io.RawArray(eeg.T, info) 
   ...: raw.set_montage("standard_1020") 
   ...:                                                                                                                                   
Creating RawArray with float64 data, n_channels=25, n_times=96735
    Range : 0 ... 96734 =      0.000 ...   386.936 secs
Ready.
DigMontage is a superset of info. 72 in DigMontage will be ignored. The ignored channels are: {'F5', 'TP8', 'PO1', 'A2', 'T7', 'FC5', 'PO2', 'P4', 'F6', 'T8', 'M1', 'T6', 'Fp1', 'F2', 'PO3', 'PO6', 'O1', 'Oz', 'Iz', 'AF10', 'P9', 'P5', 'AF3', 'T3', 'AF6', 'AF7', 'TP10', 'T4', 'PO7', 'F10', 'FT8', 'Fp2', 'A1', 'AFz', 'F7', 'T9', 'O2', 'F3', 'F8', 'P10', 'F1', 'T10', 'CP6', 'T5', 'AF2', 'TP9', 'CP5', 'AF9', 'AF4', 'F4', 'P6', 'PO9', 'FT7', 'O9', 'M2', 'P8', 'PO4', 'FC6', 'Fpz', 'P3', 'AF1', 'PO10', 'FT9', 'AF5', 'O10', 'TP7', 'FT10', 'PO8', 'AF8', 'P7', 'F9', 'PO5'}
Out[2]: <RawArray  |  None, n_channels x n_times : 25 x 96735 (386.9 sec), ~18.5 MB, data loaded>
In [3]: # make events 
   ...: event_id = 1 
   ...: duration = 20. 
   ...: events = mne.make_fixed_length_events(raw, event_id, duration=duration) 
   ...:                                                                                                                                   
In [4]: # make Epochs 
   ...: epochs = mne.Epochs(raw, events=events, event_id=event_id,  baseline=None, 
   ...:                     verbose=True, tmin=0, tmax=20)  # <- INCLUDE TMIN, TMAX HERE
   ...: print(epochs.get_data().shape) 
   ...:                                                                                                                                   
19 matching events found
No baseline correction applied
Not setting metadata
0 projection items activated
Loading data for 19 events and 5001 original time points ...
0 bad epochs dropped
(19, 25, 5001)

As you can see this yields 5001 time points. To trim off the extra 1 sample, you can either reduce tmax by about one sampling period, or you could try using epochs.crop() with the same tmin and tmax, and use it's argument include_tmax=False.

balandongiv commented 4 years ago

Hi @drammock , Once again thanks for the detail explanation. It really help my understanding. Maybe MNE team can consider to include the following example code in the mne documentation to ease other potential user in understanding better the mne.epoch function.

On separate question, may I know how to retrieve specific epoch. Say I only interested to call the third epoch, or two epochs (e.g., epoch no seven and eleven). I am unable to find related documentation for this particular problem. Currently, I have to do as below to retrieve, say, epoch 16.

MneApproach=epochs.to_data_frame()
print(MneApproach.xs(16, level='epoch').iloc[:,0])

Thanks in advance

drammock commented 4 years ago

Again, please do not use the bug tracker for questions about usage. Use the mailing list or Gitter channel (both are linked here: https://mne.tools/dev/overview/get_help.html).

Discussion of how to select specific epochs is here: https://mne.tools/dev/auto_tutorials/epochs/plot_object_epochs.html

balandongiv commented 4 years ago

Thanks for the reply, really appreciate it @drammock

mne-tools / mne-python

Incorrect values produced when epoching a continuous discrete signal using MNE epoch function #6987