holgern / pyedflib

pyedflib is a python library to read/write EDF+/BDF+ files based on EDFlib.
http://pyedflib.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
209 stars 121 forks source link

DatarecordDuration is only a multiple of 1s #189

Closed guigautier closed 9 months ago

guigautier commented 1 year ago

Hi there,

I'm using PyEDFlib to read and write some EDF files, and I'm a bit confused about the DataRecordDuration. Specifically, when I call the writeSamples() method, by default the EDF file is write with a block duration of 1sec of time intervals.

Could someone please clarify what the duration argument of the EDF header actually represents? Is there any way to use pyedflib to write EDF files with durations that are not a multiple of a second? I tried looking through the documentation and source code but didn't find anything that could help me.

How can I force the number of NSamples ?

For example:


  sample = np.random.random_sample((1, 1000)) * 100
  with tempfile.TemporaryDirectory() as tmp_dir:
      out_499 = f"{tmp_dir}/out_499.edf"
      pyedflib.highlevel.write_edf(edf_file=out_499, signals=sample, header=highlevel.make_header(),
                                   signal_headers=highlevel.make_signal_headers(list_of_labels=["channel"],
                                                                                sample_frequency=499))
      edf = pyedflib.EdfReader(str(out_499))

      assert edf.getSampleFrequency(0) == 499
      assert edf.getFileDuration() == 3
      assert edf.getNSamples()[0] == 1497 # 499 * 3 why not 998  ?  2*499 ?

      # Complete the 998 to 1497 samples with zeros ??? 

      out_500 = f"{tmp_dir}/out_500.edf"
      pyedflib.highlevel.write_edf(edf_file=out_500, signals=sample, header=highlevel.make_header(),
                                   signal_headers=highlevel.make_signal_headers(list_of_labels=["channel"],
                                                                                sample_frequency=500))

      edf = pyedflib.EdfReader(str(out_500))
      assert edf.getSampleFrequency(0) == 500
      assert edf.getFileDuration() == 2
      assert edf.getNSamples()[0] == 1000 # 500 * 2

Thanks in advance for your help!

skjerns commented 1 year ago

When working with time domain data one often thinks of data in terms of a sample frequency, i.e. how many samples are saved per second, i.e. smp/sec. However, in the EDF format, not only the nominator (smp) but also the denominator (sec) can be set and determines how many samples are saved per block (also called 'record'). So if your signal has a sampling frequency of 50 Hz you could save 50 samples per block and set the DataRecordDuration to 1 second. Alternatively, you could also save 100 samples per block and set the DataRecordDuration to 2 seconds, as 100/2 is also 50 Hz. In the end this just determines how much data is saved per block within the file and allows for sample frequencies below 1 Hz, i.e. by setting the DataRecordDuration to 0.5 while saving one sample per block.

You can manually set the DataRecordDuration by calling f.setDatarecordDuration(100000) where f is an isntance of EdfWriter. Note that the value for the function needes to be in units of 10 microseconds, ie. 1 second = 100000

Please note that DataRecordDuration and FileDuration are two different things, the former noting how long a block of data is in the time dimension and the latter denotes how long the entire file is in the time dimension.

Hope that clarifies that a bit?

The current release unfortunately contains some mistake w.r.t this, however in most cases it should still work fine. Maybe in your case it would be best to work with the development branch.