Problems with Savitzky–Golay filter in detection of alpha suppressions

mattbit commented 3 years ago

There seems to be some problems with non converging linear fit when applying the Savitzky–Golay filter on some particularly ill-behaved timeseries.

mattbit commented 3 years ago

@MartinToth can you provide more details on this?

MartinToth commented 3 years ago

Hi Matteo,

I did some retest to check and it appears that when the signal is obviously erratic, the savgol filter doesnot converge : Here the message error

File "/home/martin/neurokit/neurokit/analysis/suppressions.py", line 67, in _detect_suppressions envelope = savgol_filter(envelope, 3, 1) File "/home/martin/anaconda3/envs/eeg/lib/python3.8/site-packages/scipy/signal/_savitzky_golay.py", line 348, in savgol_filter _fit_edges_polyfit(x, window_length, polyorder, deriv, delta, axis, y) File "/home/martin/anaconda3/envs/eeg/lib/python3.8/site-packages/scipy/signal/_savitzky_golay.py", line 220, in _fit_edges_polyfit _fit_edge(x, 0, window_length, 0, halflen, axis, File "/home/martin/anaconda3/envs/eeg/lib/python3.8/site-packages/scipy/signal/_savitzky_golay.py", line 190, in _fit_edge poly_coeffs = np.polyfit(np.arange(0, window_stop - window_start), File "<__array_function__ internals>", line 5, in polyfit File "/home/martin/anaconda3/envs/eeg/lib/python3.8/site-packages/numpy/lib/polynomial.py", line 629, in polyfit c, resids, rank, s = lstsq(lhs, rhs, rcond) File "<__array_function__ internals>", line 5, in lstsq File "/home/martin/anaconda3/envs/eeg/lib/python3.8/site-packages/numpy/linalg/linalg.py", line 2306, in lstsq x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj) File "/home/martin/anaconda3/envs/eeg/lib/python3.8/site-packages/numpy/linalg/linalg.py", line 100, in _raise_linalgerror_lstsq raise LinAlgError("SVD did not converge in Linear Least Squares") numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares

I checked the EEG which is clearly flawed :

the issue comes from the preprocessing that should put the problematic part to nan

speaking of which What is the intended behaviour when there are nan ? should it be already put to 0 or the filter will just ignore those values ?

If I find another error for non-problematic segment I will update but right now I think that there is no problem in fact coming for the filter !

mattbit commented 3 years ago

Well, I don't think np.savgol works if you have NaN values. Also, to find alpha suppressions the signal is bandpass filtered with an IIR filter, and that alone would break everything if you have a single NaN value in your series. A warning is printed if you try to filter a timeseries containing NaN values. Is this your case?

vloison commented 3 years ago

In detect_suppressions, when artefacts are detected in the recording, the corresponding values are switched to nan (function artefacts_to_nan()). And the Savitzky–Golay filter does not work with signals that contain nan values. I checked what happens when we don't use the Savitzky-Golay filter : the code works well.

mattbit commented 3 years ago

@vloison can you please elaborate on your proposal?

vloison commented 3 years ago

My proposal is to replace the two following lines in the function detect_suppressions of suppressions.py :

envelope = rec.data.loc[:, channels].abs().values.max(axis=1)
envelope = savgol_filter(envelope, 3, 1)

by :

envelope = np.nanmax(rec.data.loc[:, channels].abs().values, axis=1)

When these lines are replaced, the code runs correctly, and no suppressions are detected in the intervals where artefacts are detected.

mattbit commented 3 years ago

Okay, good. Please open a pull request with the implementation and a couple of tests. Feel free to ping me if you need help.

mattbit commented 3 years ago

Okay @vloison, I have a problem: I cannot reproduce.

import numpy as np
from scipy.signal import savgol_filter

savgol_filter([1, 2, 3, np.nan, 4, 5, 6], 3, 1)  # array([ 1.,  2., nan, nan, nan,  5.,  6.])

It works in my case (scipy 1.6.3, numpy 1.20.3). Can you check?

vloison commented 3 years ago

savgol_filter([1, 2, 3, np.nan, 4, 5, 6], 3, 1)

-> This line also works for me.

Here is some code if you want to reproduce on EEG data. It uses HFS80 dataset. The code for loading and merging the recordings is not clean, it's temporary.

import numpy as np
import os
from glob import glob
import mne
from neurokit.analysis.suppressions import SuppressionAnalyzer
from neurokit.io._mne import _recording_from_mne_raw
from neurokit.preprocessing import detect_artifacts, HighAmplitudeDetector, ConstantSignalDetector, ClippedSignalDetector

data_dir = 'C:/Users/viniv/OneDrive/Documents/IBENS/Holcman/datasets/ANESTHESIA_Lariboisière/HFS80_formatted'
patient = 'DB22'

_raws = []
dir = os.path.join(data_dir, "DB22")
files = sorted(glob(dir + '/edf/*.edf'))
for f in files:
    #_r = mne.io.read_raw_edf(f, preload=True, verbose=False)
    #idx = np.where(np.abs(_r.get_data() > 1e-7))[1].max()
    #_r.crop(tmax=_r.times[idx])
    _raws.append(mne.io.read_raw_edf(f, preload=True))

raw = mne.concatenate_raws(_raws)
_mask = np.zeros(raw.get_data().shape[1], dtype=bool)
print(raw)
rec = _recording_from_mne_raw(raw)
# Detect artifacts
detectors = [HighAmplitudeDetector(low=10, high=150),
                     ClippedSignalDetector(),
                     ConstantSignalDetector(interval=int(rec.frequency))]
_artifacts_data = detect_artifacts(rec, detectors=detectors, pad=int(rec.frequency))
artifacts = nk.EventSeries(_artifacts_data, name='artifacts')
artifacts.data['code'] = 'ARTIFACT'
rec.es.add(artifacts)

# Detect suppressions
suppressions = SuppressionAnalyzer(rec)
suppressions.detect_ies(channels=['EEG R1(Fp2)', 'EEG L1(Fp1)'],
                                     threshold=8,
                                     min_duration=1,
                                     min_gap=0.5)

When I run thiswith a Savitzky-Golay filter in detect_suppressions, , I get this error message :

Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD. Traceback (most recent call last): File "C:/Users/viniv/PycharmProjects/IBENS_code/alphatracking.py", line 520, in suppressions.detect_ies(channels=['EEG R1(Fp2)', 'EEG L1(Fp1)'], File "c:\users\viniv\onedrive\documents\ibens\holcman\neurokit\neurokit\analysis\suppressions.py", line 159, in detect_ies self.iesmask = _detect_suppressions(self.recording, **kwargs) File "c:\users\viniv\onedrive\documents\ibens\holcman\neurokit\neurokit\analysis\suppressions.py", line 73, in _detect_suppressions envelope = savgol_filter(envelope, 3, 1) File "C:\Users\viniv\Anaconda3\envs\IBENS_code\lib\site-packages\scipy\signal_savitzky_golay.py", line 348, in savgol_filter _fit_edges_polyfit(x, window_length, polyorder, deriv, delta, axis, y) File "C:\Users\viniv\Anaconda3\envs\IBENS_code\lib\site-packages\scipy\signal_savitzky_golay.py", line 223, in _fit_edges_polyfit _fit_edge(x, n - window_length, n, n - halflen, n, axis, File "C:\Users\viniv\Anaconda3\envs\IBENS_code\lib\site-packages\scipy\signal_savitzky_golay.py", line 190, in _fit_edge poly_coeffs = np.polyfit(np.arange(0, window_stop - window_start), File "<__array_function__ internals>", line 5, in polyfit File "C:\Users\viniv\Anaconda3\envs\IBENS_code\lib\site-packages\numpy\lib\polynomial.py", line 660, in polyfit c, resids, rank, s = lstsq(lhs, rhs, rcond) File "<__array_function__ internals>", line 5, in lstsq File "C:\Users\viniv\Anaconda3\envs\IBENS_code\lib\site-packages\numpy\linalg\linalg.py", line 2305, in lstsq x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj) File "C:\Users\viniv\Anaconda3\envs\IBENS_code\lib\site-packages\numpy\linalg\linalg.py", line 100, in _raise_linalgerror_lstsq raise LinAlgError("SVD did not converge in Linear Least Squares") numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares

Do you also get this ?

mattbit commented 3 years ago

Can you check what exactly is causing the error?

vloison commented 3 years ago

It is the fact that we replace artifact values by a fixed value in detect_ies function.

See line 60-63 of suppressions.py :

if recording.es.has('artifacts'):
        rec = recording.artifacts_to_nan()
        #rec.data.fillna(0)
    else:
        rec = recording.copy()

Settings artifact to Nan makes the Savitzky-Golay filter crash, I tried setting artifacts to fixed values like 0, 10, 100 : the Savitzky-Golay filter also crashes in this case.

Could you see if it also crashes in your case ?

mattbit commented 3 years ago

Could you see if it also crashes in your case ?

No, it doesn't.

Under which conditions does the Savitzky–Golay filter crash?

I now read your traceback and observe the following:

the error is raised in the Intel math library, maybe there is some little bug in scipy
the problem comes from the interpolation done at the boundaries _fit_edges_polyfit (source code here[https://github.com/scipy/scipy/blob/f758938bfa8a17fd980b32aad7233ba436f4c2b3/scipy/signal/_savitzky_golay.py#L210]
this is not a problem of the Savitzky–Golay filter but of the boundary mode.

Please check this. Doing interpolation to extend the boundaries probably does not make sense in our case. Using mode='nearest' can be more appropriate, avoiding completely the interpolation code causing the issue.

vloison commented 3 years ago

I added mode='nearest' and now the code runs. I'm curious, why do we need to apply a Savitzky-Golay filter to the envelope ?

mattbit commented 3 years ago

I'm curious, why do we need to apply a Savitzky-Golay filter to the envelope ?

It's just a way to smoothen the envelope. Code was tested with that filter, so I see no reason to change it if it's not the source of the problem.

mattbit / neurokit

Problems with Savitzky–Golay filter in detection of alpha suppressions #79