holgern / pyedflib

pyedflib is a python library to read/write EDF+/BDF+ files based on EDFlib.
http://pyedflib.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
214 stars 121 forks source link

`'sample_rate'` is deprecated, but pyedflib still uses it internally #198

Open cbrnr opened 1 year ago

cbrnr commented 1 year ago

I am using highlevel.write_edf() in my tests, and even though I am passing sample_frequency instead of sample_rate, I still get a DeprecationWarning:

    def test_read_shhs(tmp_path):
        """Basic sanity checks for records read via read_shhs."""
        durations = [0.1, 0.2]  # hours
        valid_stages = {int(s) for s in SleepStage}

>       _create_dummy_shhs(data_dir=tmp_path, durations=durations)

sleepecg/test/test_sleep_readers.py:157: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sleepecg/test/test_sleep_readers.py:124: in _create_dummy_shhs
    _dummy_nsrr_edf(f"{edf_dir}/{record_id}.edf", hours, ecg_channel="ECG")
sleepecg/test/test_sleep_readers.py:29: in _dummy_nsrr_edf
    highlevel.write_edf(filename, ecg, signal_headers)
.direnv/python-3.10.10/lib/python3.10/site-packages/pyedflib/highlevel.py:486: in write_edf
    f.setSignalHeaders(signal_headers)
.direnv/python-3.10.10/lib/python3.10/site-packages/pyedflib/edfwriter.py:351: in setSignalHeaders
    self.update_header()
.direnv/python-3.10.10/lib/python3.10/site-packages/pyedflib/edfwriter.py:249: in update_header
    self._calculate_optimal_record_duration()
.direnv/python-3.10.10/lib/python3.10/site-packages/pyedflib/edfwriter.py:922: in _calculate_optimal_record_duration
    all_fs = [self._get_sample_frequency(i) for i,_ in enumerate(self.channels)]
.direnv/python-3.10.10/lib/python3.10/site-packages/pyedflib/edfwriter.py:922: in <listcomp>
    all_fs = [self._get_sample_frequency(i) for i,_ in enumerate(self.channels)]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pyedflib.edfwriter.EdfWriter object at 0x7ff705cb0370>, channelIndex = 0

    def _get_sample_frequency(self, channelIndex):
        # Temporary conditional assignment while we deprecate 'sample_rate' as a channel attribute
        # in favor of 'sample_frequency', supporting the use of either to give
        # users time to switch to the new interface.
        if 'sample_rate' in self.channels[channelIndex]:
>           warnings.warn("`sample_rate` is deprecated and will be removed in a future release. \
                          Please use `sample_frequency` instead", DeprecationWarning)
E           DeprecationWarning: `sample_rate` is deprecated and will be removed in a future release.                           Please use `sample_frequency` instead

Currently, users cannot really prevent that warning even if they have already switched to the new sample_frequency.

DimitriPapadopoulos commented 1 year ago

How to reproduce this warning? I tried with the attached sample_rate.py without success.

https://gist.github.com/DimitriPapadopoulos/599ccc16b294d22e833e566176ed3caf

cbrnr commented 1 year ago

I cannot reproduce it either, I just tried on Linux. Let me try if this was specific to macOS later today, otherwise I'll close this issue.

cbrnr commented 1 year ago

I still get the error by running test_read_shhs(); I forgot that I silenced this DeprecationWarning in pyproject.toml, so that's why I didn't see it yesterday.

DimitriPapadopoulos commented 1 year ago

So that would be test_read_shhs, which calls read_shhs:

def test_read_shhs(tmp_path):
    """Basic sanity checks for records read via read_shhs."""
    durations = [0.1, 0.2]  # hours
    valid_stages = {int(s) for s in SleepStage}

    _create_dummy_shhs(data_dir=tmp_path, durations=durations)
    records = list(read_shhs(data_dir=tmp_path, heartbeats_source="ecg", offline=True))

    assert len(records) == 4

    for rec in records:
        assert rec.sleep_stage_duration == 30
        assert set(rec.sleep_stages) - valid_stages == set()

Function read_shhs currently calls read_raw_edf:

            rec = read_raw_edf(edf_filepath, verbose=False)

But read_raw_edf is imported from mne.io.

Which version of sleepecg did you use to reproduce the problem?

cbrnr commented 1 year ago

I'm using the dev version of SleepECG, but that should be pretty much identical to the latest release v0.5.5.

The problem is in the previous line, which calls _dummy_nsrr_edf(), which in turn calls pyedflib.highlevel.write_edf() (see listing in my initial comment).

cbrnr commented 1 year ago

If you want to reproduce this, don't forget to comment out this line first.

cbrnr commented 1 year ago

What's weird is that I cannot reproduce this outside of this SleepECG test, for example with this snippet:

from pyedflib.highlevel import make_signal_headers, write_edf
from scipy.datasets import electrocardiogram

ecg = electrocardiogram()
headers = make_signal_headers(["ECG"], sample_frequency=360)
write_edf("test.edf", [ecg], headers)

@hofaflo do you have any ideas? Are you seeing the DeprecationWarning when disabling the warning filter in pyproject.toml?

DimitriPapadopoulos commented 1 year ago

Could it depend on the version of pyedflib, with a different version used in either cases?

DimitriPapadopoulos commented 1 year ago

Perhaps the problem is here is in make_signal_header:

    signal_header = {'label': label,
               'dimension': dimension,
               'sample_rate': sample_rate,
               'sample_frequency': sample_frequency,
               'physical_min': physical_min,
               'physical_max': physical_max,
               'digital_min':  digital_min,
               'digital_max':  digital_max,
               'transducer': transducer,
               'prefilter': prefiler}
    return signal_header

I guess sample_rate shouldn't be added to the dictionary. But then I see a default value for sample_rate and not sample_frequency:

def make_signal_header(label, dimension='uV', sample_rate=256, sample_frequency=None,
cbrnr commented 1 year ago

Could it depend on the version of pyedflib, with a different version used in either cases?

I just checked, I'm using the latest version 0.1.33.

Perhaps the problem is here is in make_signal_header:

I tried popping that dict key before writing the EDF, this didn't have any effect on the warning.

DimitriPapadopoulos commented 1 year ago

Additionally, I have discovered that the documentation of pyedflib.highlevel.make_signal_header and pyedflib.highlevel.make_signal_headers claims that "The default is 256" for sample_rate, instead I feel it should be None, and that the "The default is 256" for sample_frequency, but it is actually None in the code.

DimitriPapadopoulos commented 1 year ago

Perhaps the problem is here is in make_signal_header:

I tried popping that dict key before writing the EDF, this didn't have any effect on the warning.

That's the first point of failure. There must be a second one, perhaps in EdfWriter.__init__:

        for i in np.arange(self.n_channels):
            if self.file_type == FILETYPE_BDFPLUS or self.file_type == FILETYPE_BDF:
                self.channels.append({'label': f'ch{i}', 'dimension': 'mV', 'sample_rate': 100,
                                      'sample_frequency': None, 'physical_max': 1.0, 'physical_min': -1.0,
                                      'digital_max': 8388607,'digital_min': -8388608,
                                      'prefilter': '', 'transducer': ''})
            elif self.file_type == FILETYPE_EDFPLUS or self.file_type == FILETYPE_EDF:
                self.channels.append({'label': f'ch{i}', 'dimension': 'mV', 'sample_rate': 100,
                                      'sample_frequency': None, 'physical_max': 1.0, 'physical_min': -1.0,
                                      'digital_max': 32767, 'digital_min': -32768,
                                      'prefilter': '', 'transducer': ''})

                self.sample_buffer.append([])
DimitriPapadopoulos commented 1 year ago

Could you give #231 a try?

cbrnr commented 1 year ago

I am a bit concerned that we cannot come up with a minimal reproducible example, and that the warning only occurs in the SleepECG test...

DimitriPapadopoulos commented 1 year ago

@cbrnr I have installed sleepecg in dev mode. Should I see the warning when running pytest?

cbrnr commented 1 year ago

Yes. But make sure to disable the warning filter in pyproject.toml (comment out that line).

DimitriPapadopoulos commented 1 year ago

Branch :make_signal_header does work for me. Commits 35e5ed773dd8f0542dafcdece288ebcd78b860d3 and 5aebc8e410a24c2a08d24df010185b6d73656900 are sufficient to fix this particular warning.

DimitriPapadopoulos commented 1 year ago

As for why you see the warning with pytest only:

  1. DeprecationWarning is ignored by default:

    exception DeprecationWarning

    Base class for warnings about deprecated features when those warnings are intended for other Python developers.

    Ignored by the default warning filters, except in the __main__ module (PEP 565). Enabling the Python Development Mode shows this warning.

  2. By default pytest will display DeprecationWarning:

    By default pytest will display DeprecationWarning and PendingDeprecationWarning warnings from user code and third-party libraries, as recommended by PEP 565. This helps users keep their code modern and avoid breakages when deprecated warnings are effectively removed.

    Sometimes it is useful to hide some specific deprecation warnings that happen in code that you have no control over (such as third-party libraries), in which case you might use the warning filters options (ini or marks) to ignore those warnings.

cbrnr commented 1 year ago

But do you see the warning?

DimitriPapadopoulos commented 1 year ago

With pytest:

cbrnr commented 1 year ago

Yes, I can confirm that your branch fixes the issue!

DimitriPapadopoulos commented 1 year ago

We will first need to decide how we want to handle sample_rate. In any case, we should probably avoid warnings when reading sample_rate except when the value is inconsistent with sample_frequency. The question is whether we should always write sample_rate for compatibility with older versions of the module, or not.

Any way, in the short term, you should probably ignore that warning in your tests.

cbrnr commented 1 year ago

IMO the problem is that even if users write an EDF and only pass sample_frequency, they will still get the deprecation warning saying that they should not use sample_rate. That's confusing! I suggest moving that warning up into a public function, and issue it only when users pass a sample_rate.

DimitriPapadopoulos commented 1 year ago

Indeed, the warning should be emitted when writing and explicitly passing sample_rate as an argument.

It should not be emitted when reading, unless of course sample_rate and sample_frequency are different.

The question remains whether we should write sample_rate in addition to sample_frequency or not.

cbrnr commented 1 year ago

Agreed! Just to be clear, the warning we're seeing here is emitted when writing. The question of whether or not to write sample_rate in addition to sample_frequency is then already solved, no? I'd write both fields, and only emit the warning if the user passes sample_rate (which is not the case in my test since I'm passing sample_frequency).