sappelhoff / pyprep

A Python implementation of the Preprocessing Pipeline (PREP) for EEG data
https://pyprep.readthedocs.io/en/latest/
MIT License
128 stars 30 forks source link

pyprep.utils._filter_design throws error for large sampling rates #109

Open john-veillette opened 2 years ago

john-veillette commented 2 years ago

A minimal reproducible example for version 0.4.0, meant to mimic the usage in pyprep.find_noisy_channels._get_filtered_data, is as follows:

import numpy as np
from pyprep.utils import _filter_design

sample_rate = 8000.
bandpass_filter = _filter_design(
    N_order = 100,
    amp = np.array([1, 1, 0, 0]),
    freq = np.array([0, 90 / sample_rate, 100 / sample_rate, 1]),
)

This code chunk throws the following error:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_18594/550440648.py in <module>
      3 sample_rate = 8000.
      4 
----> 5 bandpass_filter = _filter_design(
      6     N_order = 100,
      7     amp = np.array([1, 1, 0, 0]),

~/anaconda3/envs/glottis/lib/python3.9/site-packages/pyprep/utils.py in _filter_design(N_order, amp, freq)
    524         ),
    525     )
--> 526     pchip_interpolate = scipy.interpolate.PchipInterpolator(
    527         np.round(np.multiply(nfft, freq)), amp
    528     )

~/anaconda3/envs/glottis/lib/python3.9/site-packages/scipy/interpolate/_cubic.py in __init__(self, x, y, axis, extrapolate)
    230     """
    231     def __init__(self, x, y, axis=0, extrapolate=None):
--> 232         x, _, y, axis, _ = prepare_input(x, y, axis)
    233         xp = x.reshape((x.shape[0],) + (1,)*(y.ndim-1))
    234         dk = self._find_derivatives(xp, y)

~/anaconda3/envs/glottis/lib/python3.9/site-packages/scipy/interpolate/_cubic.py in prepare_input(x, y, axis, dydx)
     59     dx = np.diff(x)
     60     if np.any(dx <= 0):
---> 61         raise ValueError("`x` must be strictly increasing sequence.")
     62 
     63     y = np.rollaxis(y, axis)

ValueError: `x` must be strictly increasing sequence.

Clearly the input to PchipInterpolator is the problem, so let's see what that looks like.

import numpy as np
import math

sample_rate = 8000
nfft = np.maximum(512, 2 ** (np.ceil(math.log(100) / math.log(2))))
freq = np.array([0, 90 / sample_rate, 100 / sample_rate, 1])
x = np.round(np.multiply(nfft, freq))

The output is x == array([ 0., 6., 6., 512.]), which becomes the x argument to PchipInterpolator Once sample rates are sufficiently large, the np.round operation causes consecutive values of x to be equal, which causes the error.

It would be nice to find a more robust way to filter, no? This isn't a problem until very large sampling rates (somewhere between 6 and 7 kHz), but nonetheless frustrating for those who don't want to downsample until after they've epoched their data (e.g. to minimize jitter vs. their event markers for very precise latency measurements).

I'm sure there's been some discussion on filter design in the past that lead to the use of a custom filter here instead of just using MNE's bandpass, so maybe it's not worth changing if it causes too much of a departure from the Matlab implementation -- but from this user's perspective, this would be a helpful change.

yjmantilla commented 2 years ago

Hmm indeed I would think is possible to automatically use another bandpass implementation if it fails with the default one (as in put a try clause); then if it uses that other bandpass print something to the user as a warning. What do you think @sappelhoff

sappelhoff commented 2 years ago

I think that'd be okay :+1: I actually think this might be related to #107 - maybe @a-hurst can weigh in.

a-hurst commented 2 years ago

Hi all, apologies for the slow reply! @sappelhoff, re-reading the docs it looks like PyPREP already defaults to using MNE's filtering re: #107 unless MATLAB equivalence is requested. The filter throwing the exception here is the custom filter used for creating the 1-50 Hz bandpass-filtered EEG copy used by the HF noise, correlation, and RANSAC bad channel detectors, which existed in PyPREP before I joined and is used regardless of MATLAB-equivalence.

We can certainly make this filter code another matlab_strict thing and swap in MNE filtering code in its place. If we do, we should also try adding a more informative exception in case someone does try matlab_strict mode with high sample-rate data.

sappelhoff commented 2 years ago

We can certainly make this filter code another matlab_strict thing and swap in MNE filtering code in its place. If we do, we should also try adding a more informative exception in case someone does try matlab_strict mode with high sample-rate data.

:+1: agreed, I would go that route