Nice-to-have: Exporting during EDF should check the distribution of data values

adam2392 commented 3 years ago

When exporting an EDF file, right now the physical ranges are set such that they are the min/max of each channel groups.

However, if you accidentally set a non-EEG channel as EEG, then it can potentially throw off the physical ranges for the EEG channel group, thus resulting in a badly exported dataset.

Say channel $A1 has values -1000 to 1000, and is obviously not EEG, but should be "misc".

But say all EEG channels have physical range ~0.01 to 0.03. Then the divisor is 5 orders of magnitude larger because the EEG channel group physical range will be (-1000, 1000), thus making the data un-interpretable when reading it back.

This requires the user to explicitly make sure they're not setting some of their channel types incorrectly.

Proposed Fix

We need to make sure none of the data is significantly different range then others.

I'm making this an issue becuz don't know how to solve it for now, and maybe it won't become an issue?

_Originally posted by @adam2392 in https://github.com/mne-tools/mne-python/pull/9694#discussion_r696099981_

drammock commented 3 years ago

One way of dealing with this is to just trust the user to not make a mistake. Manually changing channel types is not too frequent of an operation, is it? In other words, how likely is this to ever happen? My intuition is YAGNI.

adam2392 commented 3 years ago

Well for many read operations, such as EDF, Nihon, Persyst, etc. the default channel types are not always correct. This issue is circumvented if they "correctly" formatted their data according to BIDS.

But yeah overall I agree, problem will be rare, but if it occurs it took me a long night to find :p. Just logging this here though for future sake.

Teuniz commented 3 years ago

Well for many read operations, such as EDF, Nihon, Persyst, etc. the default channel types are not always correct. This issue is circumvented if they "correctly" formatted their data according to BIDS.

When the input file is EDF or Nihon Kohden, why not simply storing the phys max/min for each signal in the metadata
and use those values when exporting to EDF? There's nothing that can go wrong with that.

adam2392 commented 3 years ago

So I believe that's actually what is done. The issue is actually round-trip that is an artifact of the need to "assume channel groups"

Here is the issue laid out in a hypothetical workflow:

Start with a correctly formatted EDF/Nihon/etc.

You have say 3 channels:

A1 = EEG with range (-1 uV to 1 uV) A2 = EEG with range (-1.2 uV to 1 uV) EG1 = EMG with range (-1 V to 1 V)

When MNE reads this in, it won't have any way of determining that EG1 is EMG. Rn there are checks based on the character patterns, but that is not 100% foolproof. There is no consistent way of storing channel types in EDF/Nihon that can translate back to the reader. So these are all assumed to be EEG type.

That's fine though because the data is scaled properly.

Do an EDF export

Now, when doing an EDF export because EG1 is grouped with A1/A2, the EEG channel group has physical range (-1 V, 1 V), which is 6 times larger then the actual physical range it should be.

Re-read the data in

Now, when you re-read the data in the A1/A2 channels are incorrectly scaled.

Fixes

User sets channel type of EG1 to 'emg'
There is a convergence of storing different channel types in EDF/Nihon/Persyst/etc.
There is a runtime check in MNE raw.export for EDF files that the channels in each group (i.e. EEG, ecog, seeg, emg, etc.) are "close together". That is, one channel like EG1 shouldn't be an order of magnitude larger in range than the rest.

If 3 can be a robust check in MNE, then the user will be alerted and set the channel type. However, this issue was made because I'm not sure how to do 3.

mne-tools / mne-python