holgern / pyedflib

pyedflib is a python library to read/write EDF+/BDF+ files based on EDFlib.
http://pyedflib.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
209 stars 121 forks source link

Avoid deprecated sample_rate in our own code #231

Open DimitriPapadopoulos opened 1 year ago

DimitriPapadopoulos commented 1 year ago

Fixes #198.

At least I think it should. Let's wait for feedback.

The idea is that we should be able to read files with sample_rate for backwards compatibility, but not write files with sample_rate. Of course, this means that older versions of pyedflib won't be able to read files produced with newer versions of pyedeflib. Hence, I guess, the CI failure:

            for shead1, shead2 in zip(signal_headers1, signal_headers2):
                # When only 'sample_rate' is present, we use its value to write
                # the file, ignoring 'sample_frequency', which means that when
                # we read it back only the 'sample_rate' value is present.
>               self.assertDictEqual({**shead1, 'sample_frequency': shead1['sample_rate']},
                                     shead2)
E               KeyError: 'sample_rate'

An alternative would be to output warnings when reading files, only when the values of sample_rate and sample_frequency are different.

Thoughts?

cbrnr commented 1 year ago

I still get the same error when running the test. I hope I installed the branch correctly:

pip install -U git+https://github.com/DimitriPapadopoulos/pyedflib@make_signal_header#egg=pyedflib
DimitriPapadopoulos commented 1 year ago

Not sure how to install a specific branch with pip. I installed myself from a repository clone with python setup.py develop.

@skjerns How do you think compatibility should be handled?

  1. Should we write files with both sample_frequency and sample_rate, so that legacy software using older versions of pyedflib can still read new files?
  2. If the answer to the above is yes, when reading files, shouldn't we emit a deprecation warning only when sample_frequency is missing or if the values of sample_frequency and sample_rate are different? Otherwise, we get a deprecation warning when reading our own files we have just written.
skjerns commented 8 months ago
  1. Should we write files with both sample_frequency and sample_rate, so that legacy software using older versions of pyedflib can still read new files?

I think there is a confusion - in all cases, neither sample_rate nor sample_frequency are written to the file. Internally, only a smp_per_record is written and a time-span for the record_duration. If that record_duration is 1 seconds, the resulting smp_per_record equals to Hz. The usage of sample_frequency is only used to avoid confusion, as the definition of smp_per_record/record_duration is not intuitive (if the record_duration is not 1s, the sample_rate is no longer in Hz, which you would intuitively expect). However, afaik, previously, sample_rate was treated as Hz, even though it was no longer the case when record_duration was not 1.

  1. If the answer to the above is yes, when reading files, shouldn't we emit a deprecation warning only when sample_frequency is missing or if the values of sample_frequency and sample_rate are different? Otherwise, we get a deprecation warning when reading our own files we have just written.

I think we should only give this warning if the user explicitly sets sample_rate. If subfunctions write sample_rate, that should be no problem.

That means:

I'll try something in a different branch

DimitriPapadopoulos commented 8 months ago

Indeed, an EDF file contains:

8 ascii : duration of a data record, in seconds 4 ascii : number of signals (ns) in data record

ns 8 ascii : ns nr of samples in each data record

Not sure I understand how it works out for several non-contiguous recordings. Any way, I will let you figure out an elegant solution, unless you want help/input on the subject.