Resolution issue with EDF export

cbrnr commented 4 months ago

Description of the problem

When exporting this BDF file to EDF, the amplitude resolution seems to decrease a lot (amplitude values are clearly discretized). I suspect that maybe this has to do with some physical to digital min/max setting, which by default is set to "auto". Or maybe the range of the original data is too large and cannot be accurately represented with 16 bits (BDF uses 24 bits).

>>> raw.describe()
<RawEDF | S01.bdf, 73 x 1939456 (3788.0 s), ~1.05 GB, data loaded>
ch  name    type  unit        min         Q1     median         Q3        max
 0  Fp1     EEG   µV    -28733.96  -28283.90  -28180.43  -27993.25  -26932.31
 1  AF7     EEG   µV    -22318.10  -21375.51  -20833.26  -20595.17  -19586.39
 2  AF3     EEG   µV    -11227.59  -10846.75  -10493.43   -9618.47   -8873.50
 3  F1      EEG   µV    -19847.79  -19601.29  -19297.20  -18904.51  -18461.83
...
>>> raw_edf.describe()
<RawEDF | S01.edf, 72 x 1939456 (3788.0 s), ~59 kB, data not loaded>
ch  name  type  unit        min         Q1     median         Q3        max
 0  Fp1   EEG   µV    -28733.76  -28283.63  -28182.47  -27995.34  -26933.23
 1  AF7   EEG   µV    -22315.59  -21374.87  -20833.70  -20595.99  -19584.46
 2  AF3   EEG   µV    -11229.21  -10844.83  -10495.85   -9620.87   -8872.34
 3  F1    EEG   µV    -19847.46  -19599.63  -19296.17  -18906.73  -18461.66
...

For channel 0, the range is approximately 52000µV, therefore the resolutions are:

>>> 52000/2**16  # EDF
0.79345703125
>>> 52000/2**24  # BDF
0.0030994415283203125

So yes, it looks like the range is too large for EDF. Should we maybe issue a warning in such cases?

Edit: I actually didn't see the minus sign for the max value, so the range for channel 0 is 1800µV. However, the way we're currently exporting to EDF is to use the global physical min and max across all channels, which can yield a large range if the channels vary wildly in their offsets. So the conclusion is still correct, EDF cannot represent such a large range with sufficient accuracy (only 2¹⁶ values as compared to 2²⁴).

Steps to reproduce

The data set S01.bdf is linked in the first paragraph.

import mne

raw = mne.io.read_raw("S01.bdf")
raw.plot(duration=1, n_channels=1)  # plot 1
raw.export("S01.edf", overwrite=True)
raw_edf = mne.io.read_raw("S01.edf")
raw_edf.plot(duration=1, n_channels=1)  # plot 2

Expected results

The data in the exported EDF file should be identical to the original BDF file.

Actual results

This is a plot of a portion of the original BDF data (plot 1 in the MWE): plot1 And here's the plot of the exported EDF file (plot 2 in the MWE): plot2

@hofaflo what's your take on this issue?

Additional information

Platform             macOS-14.3.1-arm64-arm-64bit
Python               3.12.2 (main, Feb 14 2024, 07:24:33) [Clang 15.0.0 (clang-1500.1.0.2.5)]
Executable           /Users/clemens/.pyenv/versions/3.12.2/envs/mne/bin/python
CPU                  arm (12 cores)
Memory               32.0 GB

Core
├☑ mne               1.7.0.dev26+g7bf1b4ab7 (devel, latest release is 1.6.1)
├☑ numpy             1.26.4 (OpenBLAS 0.3.23.dev with 12 threads)
├☑ scipy             1.12.0
└☑ matplotlib        3.8.3 (backend=MacOSX)

Numerical (optional)
├☑ sklearn           1.4.1.post1
└☐ unavailable       numba, nibabel, nilearn, dipy, openmeeg, cupy, pandas

Visualization (optional)
└☐ unavailable       pyvista, pyvistaqt, vtk, qtpy, ipympl, pyqtgraph, mne-qt-browser, ipywidgets, trame_client, trame_server, trame_vtk, trame_vuetify

Ecosystem (optional)
└☐ unavailable       mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline, neo

Teuniz commented 4 months ago

Here's how EDFbrowser is doing it:

Remove the DC-offset using a highpass filter.
Set the new digital max and min to 32767 and -32768.
Calculate the new physical max and min based on the unit/lsb.
For every digital sample:
- clip to the new digital max and min
- use the 16 least significant bits as the new sample

You can try and view the result using the BDF to EDF converter in EDFbrowser.

cbrnr commented 4 months ago

Thanks @Teuniz! I think this would be a good option, although I'd not do it automatically. We could either document it or introduce a new parameter.

hofaflo commented 4 months ago

For channel 0, the range is approximately 52000µV

With the values below, shouldn't that be 28734 - 26932 = 1802?

ch  name    type  unit        min         Q1     median         Q3        max
 0  Fp1     EEG   µV    -28733.96  -28283.90  -28180.43  -27993.25  -26932.31

I think the main issue here is that the export uses the same physical range per channel type. This makes sense for data that was initially recorded into an EDF file, with physical min/max set to the actual amplifier limits. However, I'd argue that it's not the best idea for data coming from a format supporting higher resolution (like BDF), as EDF's resolution might not be enough to cover large DC offsets across all signals.

I suggest to allow setting the physical range individually per signal (in the current (unreleased) version of the export code, pass physical_range=None here). Additionally, if the current behavior is to be kept as the default, it might make sense to issue a warning in case the resulting resolution is below some threshold deemed "too low".

cbrnr commented 4 months ago

Ah yes, I missed the minus sign for the maximum 😄. But still, the underlying issue is the signal range being too wide due to offsets in the channels.

A combination of your suggestions might be the best way to proceed. Setting physical_range=None can be done in any case, right? Can you let us know when the feature will be available in edfio? In addition, if the range in individual channels is still too large for 16 bits, I'd issue a warning (the threshold has to be decided, but I'd say we need at least 0.01µV?).

Teuniz commented 4 months ago

If you are not willing to get rid of the DC-offset, then forget about converting BDF to EDF. It's simply not possible without loosing resolution. If, for whatever reason, you need to keep the DC-offset, then you need to export to BDF instead. Most of the time, there's a reason why a recording is made with a resolution >16bit and, in general, it's better to avoid the conversion from BDF to EDF.

cbrnr commented 4 months ago

I agree, but technically the offset is irrelevant if the range of a channel can be represented accurately with 16 bits. It's all about the range.

Teuniz commented 4 months ago

I agree, but technically the offset is irrelevant if the range of a channel can be represented accurately with 16 bits. It's all about the range.

It cannot, that's why for example Biosemi uses 24 bit adc's. How do you think Biosemi's BDF to EDF converter works? But feel free to invent the wheel again.

cbrnr commented 4 months ago

OK, I'm not going to argue with you.

cbrnr commented 4 months ago

@hofaflo indeed setting physical_range=None fixes the issue with my example data. Why did we not do this in the first place and instead set all channels to the global physical min and max?

hofaflo commented 4 months ago

Can you let us know when the feature will be available in edfio?

Already is, with "current (unreleased) version of the export code" I meant to refer to the code in mne/export/_edf.py.

Why did we not do this in the first place and instead set all channels to the global physical min and max?

This seems to have been initially introduced in 847f6de as a result of the discussion in #9643. I did not change it in #12218 to avoid breaking existing behavior.

cbrnr commented 4 months ago

Thanks, let's see if the tests still pass if I change it!

cbrnr commented 4 months ago

@hofaflo before changing the scaling behavior, I would like to make sure that we don't overlook potential issues. This EEGLAB issue discusses errors that occur when channels are stored with different resolutions (i.e. individual physical min and max), such as not being able to compute a bipolar derivation. However, I think this is due to the subsequent processing steps and not caused by having different resolutions. Furthermore, @Teuniz mentions that the EDF standard mandates that all channels have the same physical min/max, but I did not find this anywhere in the specs.

So I'd really appreciate some insights, and especially maybe a concrete example where channel-specific resolutions can be problematic.

I mean, one argument against channel-specific resolutions is that if other software (such as EEGLAB) cannot handle this, we should probably not do it, even though it would be perfectly OK from a signal processing and EDF standards point of view.

mne-tools / mne-python