mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.67k stars 1.31k forks source link

Highpass filter and NaNs result in long NaN segments #12375

Open cbrnr opened 8 months ago

cbrnr commented 8 months ago

Description of the problem

I'm not sure if highpass filters play nicely with (raw) data containing NaNs. If I filter such a signal, the NaN segment seems to get extremely long, and I can't explain why. This does not happen with lowpass filters.

Steps to reproduce

The following MWE creates a raw object with 3 channels and a duration of 25s. The second channel (channel 1) has missing data (NaNs) in the first second. After highpass-filtering, the initial 14s are missing, which seems very long given the filter specs.

import numpy as np
from numpy.random import default_rng

from mne import create_info
from mne.io import RawArray

def create_toy_data(n_channels=3, duration=25, sfreq=250, seed=None):
    """Create toy data."""
    rng = default_rng(seed)
    data = rng.standard_normal(size=(n_channels, duration * sfreq)) * 5e-6
    info = create_info(n_channels, sfreq, "eeg")
    return RawArray(data, info)

raw = create_toy_data()
raw._data[1, :250] = np.nan
raw.plot(duration=25)
raw.filter(l_freq=5, h_freq=None)  # highpass
raw.plot(duration=25)

Expected results

I'd expect the filtered signal to contain fewer missing values.

Actual results

Original signals:

Screenshot 2024-01-22 at 15 56 41

Filtered signals:

Screenshot 2024-01-22 at 15 57 03

Filter design logging messages:

Filtering raw data in 1 contiguous segment
Setting up high-pass filter at 5 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 5.00
- Lower transition bandwidth: 2.00 Hz (-6 dB cutoff frequency: 4.00 Hz)
- Filter length: 413 samples (1.652 s)

Additional information

Platform             macOS-14.2.1-arm64-arm-64bit
Python               3.11.6 (v3.11.6:8b6ee5ba3b, Oct  2 2023, 11:18:21) [Clang 13.0.0 (clang-1300.0.29.30)]
Executable           /Users/clemens/.local/pipx/venvs/ipython/bin/python
CPU                  arm (12 cores)
Memory               32.0 GB
Core
├☑ mne               1.7.0.dev6+g6c6e6ec6d (unable to check for latest version on GitHub, SSL error)
├☑ numpy             1.26.3 (OpenBLAS 0.3.23.dev with 12 threads)
├☑ scipy             1.11.4
├☑ matplotlib        3.8.2 (backend=MacOSX)
├☑ pooch             1.8.0
└☑ jinja2            3.1.2

Numerical (optional)
├☑ pandas            2.1.4
└☐ unavailable       sklearn, numba, nibabel, nilearn, dipy, openmeeg, cupy

Visualization (optional)
└☐ unavailable       pyvista, pyvistaqt, vtk, qtpy, ipympl, pyqtgraph, mne-qt-browser, ipywidgets, trame_client, trame_server, trame_vtk, trame_vuetify

Ecosystem (optional)
└☐ unavailable       mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline, neo
larsoner commented 8 months ago

This is probably because we use overlap-add filtering using FFTs. As soon as you take the FFT you get all NaNs so that whole chunk will be NaN. A potential fix would be to if not np.isfinite(x).all(): and in that case use slower time-domain convolution. We could do this just for the given segment of data that has the issue, i.e., within the overlap-add loop. I don't think it would be too painful performance wise since np.isfinite should be much faster than the actual FFTs/IFFTs and triggering the slower time-domain path should be rare.

cbrnr commented 8 months ago

This would be great! Do you know why this only happens with high-pass and not with low-pass filters?

cbrnr commented 8 months ago

Also, I tried using raw.filter(l_freq=5, h_freq=None, method="iir"), which should not use overlap-add (https://github.com/mne-tools/mne-python/blob/main/mne/filter.py#L1109-L1112), the entire channels gets zapped.

larsoner commented 8 months ago

Do you know why this only happens with high-pass and not with low-pass filters?

The high-pass is likely a longer filter. Then somewhere we decide what length FFTs to use in overlap-add filtering. It's probably some combination of those two things.

Also, I tried using [IIR filtering], the entire channels gets zapped.

Yeah that's a consequence of the "infinite" in "infinite impulse response". One NaN value will spread to all other values when forward-backward filtering.

cbrnr commented 8 months ago

But do you think we can/should make IIR filtering work with missings?

larsoner commented 8 months ago

No I think for IIR the right thing to do is for people to annotate_nan then segments can/should be processed separately by raw.filter by setting skip_by_annotation appropriately