Documentation of rsp_clean() seems inconsistent with implementation

danibene commented 5 months ago

Hello! I am a bit confused about the default method for cleaning respiration signals.

In the docstring, it is written that there is linear detrending followed by a 5th order lowpass Butterworth filter with a cutoff of 2Hz: https://github.com/neuropsychology/NeuroKit/blob/3d004b4777c3e59b288f46b9f0f812ac1e04b5d4/neurokit2/rsp/rsp_clean.py#L18-L19

But the implementation seems to just be a 2nd order bandpass Butterworth filter with cutoffs of 0.5 Hz and 3 Hz: https://github.com/neuropsychology/NeuroKit/blob/3d004b4777c3e59b288f46b9f0f812ac1e04b5d4/neurokit2/rsp/rsp_clean.py#L121-L143

Also, in the cited paper, Khodadad et al. (2018) wrote "However, further improvement is also obtained by pre-processing the data using a digital high-pass filter to remove the dominating low frequency contents. Here, a second order high-pass Butterworth filter has been used with a cut-off frequency of 15 breaths/min." Wouldn't that correspond to 0.25 Hz?

Should we update the documentation to reflect that it is a 2nd order bandpass Butterworth filter with cutoffs of 0.5 Hz and 3 Hz?

DominiqueMakowski commented 5 months ago

I think we should adjust the method to match the paper if there's a discrepancy no?

danibene commented 5 months ago

Seems like the discrepancy between the docstring and the implementation started here:

https://github.com/neuropsychology/NeuroKit/commit/d9470093b1a4ae8f8fdd5f57b5f801821171c366

Any idea why that was done? Maybe another option could be to have a default "neurokit" method (in case this implementation does work better) as well as the original implementation based on the one from the paper

DominiqueMakowski commented 5 months ago

Any idea why that was done?

indeed that's strange and unfortunately the committer disappeared from GH it seems. I think the idea was to replace the detrending by the highpass, but...

in case this implementation does work better

The problem is how to benchmark that, I ran a quick search for annotated RSP data just to have some empirical evidence and found this dataset:

https://www.physionet.org/content/bidmc/1.0.0/

That means we could maybe try to compare the time of peaks&troughts after various cleaning procedures vs. their annotations?

And then, if the current version is better, we move it to a new neurokit default method. If not, we drop it. What do you think?

Also they describe another method here based on Lu2006 that I don't think we implement:

https://physiodatatoolbox.leidenuniv.nl/docs/user-guide/physioanalyzer-modules/resp-module.html

danibene commented 5 months ago

Making the decision based on its performance on that dataset sounds good! I just don't know when I would be able to do that (though you or any kind stranger lurking here is welcome to), so my proposal would be:

PR changing the docstring to reflect the current implementation, mentioning this issue & that it currently doesn't reflect the implementation in the paper (so that the default functionality isn't changed before we check its performance)
PR downloading and processing the open-access dataset
PR adding study comparing methods
PR changing or adding implementation depending on results

What do you think?

DominiqueMakowski commented 5 months ago

sounds good! 👌

neuropsychology / NeuroKit

Documentation of rsp_clean() seems inconsistent with implementation #950