Closed skjerns closed 1 year ago
oh waouh.... to me resampling by buffer is really a bad idea. I have not realized it was done like that.
I don't we can address this issue very rapidly. Can you go around the pb with your analysis eg by exclude the channels with a higher sampling rate?
Message ID: @.***>
We should definitely resample the whole signal to avoid edge artifacts. Maybe the blockwise implementation is used to support on-demand reading of specific data segments?
oh waouh.... to me resampling by buffer is really a bad idea. I have not realized it was done like that. I don't we can address this issue very rapidly. Can you go around the pb with your analysis eg by exclude the channels with a higher sampling rate?
Unfortunately this is not possible in my case, as by accident Pz
and M1/M2
were recorded with different sampling frequencies. However, I can resort to loading everything with pyedflib
and putting it manually into a RawArray
btw: what do you abbreviate with pb? never seen it before and google doesnt help me
Workaround:
from pyedflib import highlevel
import numpy as np
import mne
def resample(data, o_sfreq, t_sfreq):
"""
resample a signal using MNE resample functions
This automatically is optimized for EEG applying filters etc
:param raw: a 1D data array
:param o_sfreq: the original sampling frequency
:param t_sfreq: the target sampling frequency
:returns: the resampled signal
"""
if o_sfreq==t_sfreq: return data
raw = np.atleast_2d(data)
ch_names=['ch{}'.format(i) for i in range(len(raw))]
info = mne.create_info(ch_names=ch_names, sfreq=o_sfreq, ch_types=['eeg'])
raw_mne = mne.io.RawArray(raw, info, verbose='WARNING')
resampled = raw_mne.resample(t_sfreq, n_jobs=-1, verbose='WARNING')
new_raw = resampled.get_data().squeeze()
return new_raw.astype(raw.dtype, copy=False)
def load_edf(edf_file, channels):
"""
Parameters
----------
edf_file : str
edf file name.
channels : list of str
list of channel names to load.
Returns
-------
raw : TYPE
DESCRIPTION.
convenience function to load an EDF+ without MNE.
Workaround for https://github.com/mne-tools/mne-python/issues/10635"""
if not isinstance(channels, list):
channels = [channels]
sigs, sigheads, header = highlevel.read_edf(edf_file,
ch_names=channels)
sfreqs = [s['sample_rate'] for s in sigheads]
labels = [s['label'] for s in sigheads]
if len(set(sfreqs))>1:
sigs = [resample(x, sfreqs[i], min(sfreqs)) for i, x in enumerate(sigs)]
info = mne.create_info(labels, min(sfreqs), ch_types='eeg')
raw = mne.io.RawArray(sigs, info)
return raw
@skjerns I think it would be great if you added this code snippet to our docs. Maybe a short note about the problem and your proposed workaround would fit here?
Oh, and pb = problem.
I can do that.
@skjerns would you still be up to implement a proper solution, i.e. resampling of the entire signal? The current block-wise resampling is highly problematic to the point where I'd rather not do it at all (and let users only load channel groups with the same sampling frequency to avoid block-wise resampling altogether).
I can probably have a look at it after some deadlines have past (~end the end of february)
@cbrnr I'm struggeling a bit to implement it in a proper way.
Currently I have a working implementation that loads the data and upsamples to the highest sampling frequency in the file of all requested channels by loading all data into lists and upsampling once in the end. _mult_cal_one
is called all in the end after all data has been upsampled. However, this process also resamples when accessing only a single channel when not preloading e.g.
raw = mne.io.read_raw_edf('diff-sfreq.edf') # file containing 10, 20 and 30 Hz, no preloading
data = raw.get_data(f'10hz}') # result will be upsampled to 30 Hz
raw = mne.io.read_raw_edf('diff-sfreq.edf', exclude=['30hz']) # file containing 10, 20 and 30 Hz, no preloading
data = raw.get_data(f'10hz}') # result will be upsampled to 20 Hz
The problem is that data
is given to the _read_segment_file
as a view and is already expecting 30 Hz, there is not much I can do about that without changing BaseRaw
, as far as I can see?
How would you say to proceed?
Solutions:
excluded
or included
? Besides that it seems to work, just some tests throw errors for small deviations in values when loaded on the fly or the entire signal at once when compared to loading the data from smaller segments. I need to look into how to fix that (i.e. test_edf_others
). It might be that the "new" values are correct, and the old ones are wrong due to edge artifacts. Any idea how to address this if it happens, without tempering with other files in the repository too much (the test is defined in base/tests
)?
@cbrnr https://github.com/cbrnr I'm struggeling a bit to implement it in a proper way.
Currently I have a working implementation that loads the data and upsamples to the highest sampling frequency in the file. However, this also resamples when accessing only a single channel e.g.
raw = mne.io.read_raw_edf('diff-sfreq.edf') # file containing 10, 20 and 30 Hzdata = raw.get_data(f'10hz}') # result will be upsampled to 30 Hz
yes it's the expected behavior.
you avoid this you need to pick the channels you want in the init. Once you created the raw instance you have defined the info["sfreq"] which is not channel specific.
Message ID: @.***>
@cbrnr https://github.com/cbrnr I'm struggeling a bit to implement it in a proper way. Currently I have a working implementation that loads the data and upsamples to the highest sampling frequency in the file. However, this also resamples when accessing only a single channel e.g. raw = mne.io.read_raw_edf('diff-sfreq.edf') # file containing 10, 20 and 30 Hzdata = raw.get_data(f'10hz}') # result will be upsampled to 30 Hz yes it's the expected behavior. you avoid this you need to pick the channels you want in the init. Once you created the raw instance you have defined the info["sfreq"] which is not channel specific.
ok, thanks!
Now the only problem remaining is one tests fails when preload=False
and specific timeframes are accessed. I presume this is due to edge artifacts, as it only happens for channels which were resampled:
which calls this test in io/tests/test_raw.py
https://github.com/mne-tools/mne-python/blob/bf2502166eb15626c1205accc2d2d467535b8d93/mne/io/tests/test_raw.py#L120-L125
Plotting differences between the loaded signals reveals A9 having clear edge artefacts, as it is indeed the only resampled channel
However, some other time indices look weird and not like clear edge artifacts:
Any idea what to do with this? Simply remove the test?
@skjerns where can I see your code? Did I miss the pull request?
@agramfort sure: https://github.com/mne-tools/mne-python/pull/11549
Did I understand you correctly that with preload=True
everything already works as expected? If so, preload=False
triggers resampling when data is accessed on-the-fly, so it shouldn't really change that much (except if the previous block size is different from the current one). I mean, the idea to perform resampling on the whole signal only makes sense for preload=True
, so do we even have to support the preload=False
case? What if we leave that as is and output a warning if some (block-wise) resampling is going on?
Describe the new feature or enhancement
It is possible to load EDF+ files with different sampling frequencies. To my understanding, internally, the channels with differing sampling frequencies are resampled. I understand that the underlying data format
BaseRAW
assumes that all channels have the same sampling frequency, so resampling the signals to a common denominator is strictly necessary.Describe your proposed implementation
However, this resampling is happening per block, i.e. every 10 MB of data, and therefore introduces edge artefacts (as also noted in the comments). These are mostly minor, however, in some of my calculations they do indeed change the results of our analysis.
https://github.com/mne-tools/mne-python/blob/549de120926465a514ac5d7b0b8950f49ec2ade5/mne/io/edf/edf.py#L330=
Describe possible alternatives
I think it would be good to load all channels into different data buffers, and only resample them after loading the entire file. However, maybe there was a reasoning why the resampling takes place on a per-block basis (e.g. efficiency?), so please let me know if it would be worth implementing such a change, or if I could give it a try to refactor the function to resample the data after loading everything.