holgern / pyedflib

pyedflib is a python library to read/write EDF+/BDF+ files based on EDFlib.
http://pyedflib.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
218 stars 124 forks source link

Corrupted file and clipped signals with signals of different lengths #58

Closed eniemela3 closed 1 year ago

eniemela3 commented 5 years ago

If signals of different lengths are written to an EDF file, the signals read as if they had been clipped from the end to match the length of the shortest signal. Also, the file header appears to be corrupted when using another EDF reader (EDFbrowser). The error message from EDFbrowser reads:

Error, filesize does not match with the calculated filesize based on the parameters in the header. Filesize is 2124 and filesize according to header is 2120. You can fix this problem with the header editor, check the manual for details. File is not a valid EDF or BDF file.

This would suggest that the missing data might actually be saved, but pyedflib does not read it. The scenario can be reproduced with the following code:

import pyedflib
import numpy as np

f = pyedflib.EdfWriter("test_different_lengths.edf", 3)
labels = ["1", "2", "3"]
derived_signals = [np.array(list(range(10))),
                   np.array(list(range(7))),
                   np.array(list(range(10)))]
headers = []
for label in labels:
    header = {'label': label, 'dimension': '', 'sample_rate': 1,
              'physical_max': 10, 'physical_min': 0,
              'digital_max': np.iinfo(np.int16).max, 'digital_min': np.iinfo(np.int16).min,
              'transducer': '', 'prefilter':''}
    headers.append(header)
f.setSignalHeaders(headers)
f.writeSamples(derived_signals)
f.close()

edf_file = pyedflib.EdfReader("test_different_lengths.edf")
edf_samples = edf_file.getNSamples()
print(edf_samples) # [7 7 7]
# All of the following give approx. [0 1 2 3 4 5 6]
print(edf_file.readSignal(0)) # should be 10 long
print(edf_file.readSignal(1)) # should be 7 long - ok
print(edf_file.readSignal(2)) # should be 10 long
BlakeJC94 commented 3 years ago

Thought I was having a similar issue, but I think this is due to the sample rates you set in the channel header.

I updated your code with how it should (presumably) be handled in the case of signals with different lengths

import pyedflib
import numpy as np

f = pyedflib.EdfWriter("test_different_lengths.edf", 3)
labels = ["1", "2", "3"]
srates = [5, 10, 5]  # Hz
duration = 2  # Seconds
derived_signals = [np.arange(srates[0]*duration),
                   np.arange(srates[1]*duration),
                   np.arange(srates[2]*duration)]
headers = []
for label, srate in zip(labels, srates):
    header = {'label': label, 'dimension': '', 'sample_rate': srate,
              'physical_max': 20, 'physical_min': 0,
              'digital_max': np.iinfo(np.int16).max, 'digital_min': np.iinfo(np.int16).min,
              'transducer': '', 'prefilter': ''}
    headers.append(header)
f.setSignalHeaders(headers)
f.writeSamples(derived_signals)
f.close()

edf_file = pyedflib.EdfReader("test_different_lengths.edf")
edf_samples = edf_file.getNSamples()
print(edf_samples) # [10 20 10]
# All of the following give approx. [0 1 ... 8 9] or [0 1 ... 18 19]
print(len(edf_file.readSignal(0))) # should be 10 long - ok
print(len(edf_file.readSignal(1))) # should be 20 long - ok
print(len(edf_file.readSignal(2))) # should be 10 long - ok

This also fixes the error message that comes up in EDFbrowser

skjerns commented 3 years ago

thanks @BlakeJC94

To my knowledge, all signals within an EDF+ file need to have the same length in the time dimension. This means that you need to adapt the sample rate accordingly if you have signals with "different length". EDF+ was made for biosignals, for which it only makes sense to record signals simultaneously, i.e. all signals have the same length in time.