mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.7k stars 1.31k forks source link

Error reading big BDF file #5685

Closed mkoculak closed 5 years ago

mkoculak commented 5 years ago

Describe the bug

I am trying to read a relatively big .bdf file (~3GB, 80 channels, 2048Hz, 6215 sec. of recording) and got the error below.

I successfully read the data with pyedflib:

import pyedflib

f = pyedflib.EdfReader('0730.bdf')

array = np.zeros((80,2048*6215))
for i in range(f.signals_in_file):
    array[i,:] = f.readSignal(i)

I have also checked readability with BDFReader shipped with ActiView by Biosemi and it reads the data with no problems.

I tried it with a couple of different files of similar size, the result is the same. All files were recorded with ActiveView and not modified after.

Steps and/or code to reproduce

As mentioned, the file is big, but I can share a link to OneDrive copy if necessary.

import mne

data = mne.io.read_raw_edf('0730.bdf', preload=True)
Extracting EDF parameters from D:/Marcin/OneDrive - Uniwersytet Jagielloński/Datasets/Preludium 12/EEG sen/0730.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 12728319  =      0.000 ...  6215.000 secs...

C:\Users\marcin\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\edf\edf.py:229: RuntimeWarning: overflow encountered in int_scalars
  block_offset = ai * ch_offsets[-1] * dtype_byte

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-7-91ba1d585082> in <module>()
----> 1 data = mne.io.read_raw_edf(path+'EEG sen/0730.bdf', preload=True, stim_channel=None)

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\edf\edf.py in read_raw_edf(input_fname, montage, eog, misc, stim_channel, annot, annotmap, exclude, preload, verbose)
   1237     return RawEDF(input_fname=input_fname, montage=montage, eog=eog, misc=misc,
   1238                   stim_channel=stim_channel, annot=annot, annotmap=annotmap,
-> 1239                   exclude=exclude, preload=preload, verbose=verbose)

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\edf\edf.py in __init__(self, input_fname, montage, eog, misc, stim_channel, annot, annotmap, exclude, preload, verbose)

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\utils.py in verbose(function, *args, **kwargs)
    727         with use_log_level(verbose_level):
    728             return function(*args, **kwargs)
--> 729     return function(*args, **kwargs)
    730 
    731 

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\edf\edf.py in __init__(self, input_fname, montage, eog, misc, stim_channel, annot, annotmap, exclude, preload, verbose)
    168         super(RawEDF, self).__init__(
    169             info, preload, filenames=[input_fname], raw_extras=[edf_info],
--> 170             last_samps=last_samps, orig_format='int', verbose=verbose)
    171 
    172     @verbose

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\base.py in __init__(self, info, preload, first_samps, last_samps, filenames, raw_extras, orig_format, dtype, verbose)

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\utils.py in verbose(function, *args, **kwargs)
    727         with use_log_level(verbose_level):
    728             return function(*args, **kwargs)
--> 729     return function(*args, **kwargs)
    730 
    731 

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\base.py in __init__(self, info, preload, first_samps, last_samps, filenames, raw_extras, orig_format, dtype, verbose)
    366         self._update_times()
    367         if load_from_disk:
--> 368             self._preload_data(preload)
    369 
    370     @verbose

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\base.py in _preload_data(self, preload, verbose)

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\utils.py in verbose(function, *args, **kwargs)
    727         with use_log_level(verbose_level):
    728             return function(*args, **kwargs)
--> 729     return function(*args, **kwargs)
    730 
    731 

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\base.py in _preload_data(self, preload, verbose)
    621         logger.info('Reading %d ... %d  =  %9.3f ... %9.3f secs...' %
    622                     (0, len(self.times) - 1, 0., self.times[-1]))
--> 623         self._data = self._read_segment(data_buffer=data_buffer)
    624         assert len(self._data) == self.info['nchan']
    625         self.preload = True

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\base.py in _read_segment(self, start, stop, sel, data_buffer, projector, verbose)
    518             self._read_segment_file(data[:, this_sl], idx, fi,
    519                                     int(start_file), int(stop_file),
--> 520                                     cals, mult)
    521             offset += n_read
    522         return data

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\edf\edf.py in _read_segment_file(self, data, idx, fi, start, stop, cals, mult)

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\utils.py in verbose(function, *args, **kwargs)
    727         with use_log_level(verbose_level):
    728             return function(*args, **kwargs)
--> 729     return function(*args, **kwargs)
    730 
    731 

~\AppData\Local\conda\conda\envs\mne\lib\site-packages\mne\io\edf\edf.py in _read_segment_file(self, data, idx, fi, start, stop, cals, mult)
    229                 block_offset = ai * ch_offsets[-1] * dtype_byte
    230                 n_read = min(len(r_lims) - ai, n_per)
--> 231                 fid.seek(start_offset + block_offset, 0)
    232                 # Read and reshape to (n_chunks_read, ch0_ch1_ch2_ch3...)
    233                 many_chunk = _read_ch(fid, subtype, ch_offsets[-1] * n_read,

OSError: [Errno 22] Invalid argument

Additional information

mne.sys_info()
Platform:      Windows-10-10.0.17763-SP0
Python:        3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)]
Executable:    C:\Users\marcin\AppData\Local\conda\conda\envs\mne\python.exe
CPU:           Intel64 Family 6 Model 158 Stepping 10, GenuineIntel: 12 cores
Memory:        31.9 GB

mne:           0.16.2
numpy:         1.15.1 {blas=mkl_rt, lapack=mkl_rt}
scipy:         1.1.0
matplotlib:    2.2.2 {backend=Qt5Agg}

sklearn:       0.19.2
nibabel:       2.2.1
mayavi:        Not found
pycuda:        Not found
skcuda:        Not found
pandas:        0.23.4

I was looking around if there were some issues with big files, but did not find anything on the matter. If this was already addressed somewhere, I would be grateful for pointing me to the right direction.

agramfort commented 5 years ago

are you using windows? it can be an overlflow of integers as int are int32 on windows.

yes please share the file so we can try to replicate the pb.

thanks

mkoculak commented 5 years ago

Yes, I am using Windows 10. Is there any way to overcome the overflow issue on Windows?

the link to the file (if it won't work, I can try uploading it somewhere else): https://ujchmura-my.sharepoint.com/:u:/g/personal/marcin_koculak_doctoral_uj_edu_pl/ESeDLKjMJEFLtV3t5VO1jecB0nOBVQ-X0Cvwrq7V3ihVUw?e=gyUfWg

agramfort commented 5 years ago

I confirm that I have no problem loading your file on macos with the current master branch.

I don't have a big enough windows machine to debug this :(

I would put a break point in the code before the seek crash here : https://github.com/mne-tools/mne-python/blob/master/mne/io/edf/edf.py#L264 see if you actually end up with negative integers and then track down the origin of the problem.

HTH

mkoculak commented 5 years ago

So it seems that the problem is with the variable ch_offset created couple lines earlier https://github.com/mne-tools/mne-python/blob/master/mne/io/edf/edf.py#L250

ch_offsets = np.cumsum(np.concatenate([[0], n_samps]))

because it is created as int32 and then creates the overflow of block_offset variable.

Adding a parameter dtype=np.int64 to the numpy cumsum function made the code run without problems.

ch_offsets = np.cumsum(np.concatenate([[0], n_samps]), dtype=np.int64)
agramfort commented 5 years ago

Great can you send us a PR? Thx