neuropsychology / NeuroKit

NeuroKit2: The Python Toolbox for Neurophysiological Signal Processing
https://neuropsychology.github.io/NeuroKit
MIT License
1.55k stars 413 forks source link

signal_interpolate(): dealing with physiologically implausible values caused by under- or overshooting interpolation #243

Closed DominiqueMakowski closed 4 years ago

DominiqueMakowski commented 4 years ago

Following #180 and #242 interpolation can create crazy values, extending far beyond the original range.

We could add a checker for that, throwing a warning for instance if the new range is extended by more than 25% (or anything else): "Warning: the " + interpolation_method + " method for interpolation created some implausible values. Proceed with caution, visualize the resulting interpolated signal and consider switching to linear interpolation."

Additionally, we could add a safe or cap argument that would, for instance, replace all interpolated values outside this "safe" zone (e.g., 125% of range) by either the max/min or by NaNs?

@JanCBrammer thoughts?

JanCBrammer commented 4 years ago

@DominiqueMakowski, I noticed this behavior of the interpolation while implementing ppg_simulate(). I had difficulty finding words for this behavior, but now that it comes up again I did some research. The behavior we see is referred to a "overshooting" or "undershooting" of the interpolated values in the y direction. This happens when the interpolation method is not monotone.

These three resources help explain and visualize this.

The solution is to use methods that preserve monotonocity (i.e., that do not overshoot or undershoot the y values). These methods are linear interpolation, Akima interpolation (I use this one in ppg_simulate()), or Monotone cubic spline interpolation. I suggest we use these methods instead of trying to correct or warn against over- / undeshooting. The latter could unnecessarily confuse users and clutters the code.

DominiqueMakowski commented 4 years ago

Thanks for the resources! Do you think we should replace the cubic default by one of these then? Or do they, on average, not perform better? (the goal of the default method being to have something that works the best on average, in the majority of cases and that is not too dangerous)

JanCBrammer commented 4 years ago

Yes, I suggest we simply replace the default cubic spline with monotone cubic spline. How would you quantify "better performance" / what are your criteria for "good"?

DominiqueMakowski commented 4 years ago

I think the goal of interpolation is to create realistic values (from a biological standpoint). From there, a result can be "unrealistic" if 1) the pattern is not biologically plausible (for instance, linear interpolation is not realistic for this aspect, as the body doesn't produce consecutive linear changes but rather a smooth transition (note that for rate it is indeed a very complex issue as the true "continuous" process behind it is hidden)) or 2) it has impossible values (e.g., negative values).

So I guess a good default method should aim at the sweet spot between mimicking a realistic "smooth" biological process and being safe and robust to artefacts (i.e., not creating crazy solutions)

Yes, I suggest we simply replace the default cubic spline with monotone cubic spline.

It seems like we don't have these options in signal_interpolate()

JanCBrammer commented 4 years ago

good default method should aim at the sweet spot between mimicking a realistic "smooth" biological process and being safe and robust to artefacts (i.e., not creating crazy solutions)

Both Akima and monotinic cubic spline do a good job in that sense. I find the issue of "biologically plausible" interpolation interesting (i.e, which "continuous" assumptions are we willing to make about a physiological process that is discrete, at least empirically). Do you have literature about that?

It seems like we don't have these options in signal_interpolate()

We can wrap the corresponding methods from the scipy.interpolation module. I could look into this tomorrow.

DominiqueMakowski commented 4 years ago

Do you have literature about that?

Not really from the top of my head, I remember reading several times "we interpolated with cubic as it is biologically plausible" or something like that, but I don't remember any specific investigation of that, but as I said it's a tricky issue, I reckon one would have to look at the neural correlates of heart/rsp rate modulations, and track it there to get the higher temporal resolution. I suppose physiologists/cardiologists must have been looking into it, I'll take a look

JanCBrammer commented 4 years ago

Ok, we now solved the problem of the under- or overshooting interpolation by using monotone cubic splince interpolation. One small problem remains with this. The method cannot extrapolate constant values which results in some crazy extrapolated values outside the interpolation range (< x_values[0], > x_values[-1]). See picture. But I don't think it's to0 difficult to set the extrapolated values to something else manually (e.g. all values < x_values[0] = x_values[0]). Figure_1

Tam-Pham commented 4 years ago

I agree that the extrapolation gives rather crazy values.

Below I tried to interpolate with different method:

image

We can manually extrapolate with [y_values[0]] and [y_values[-1]] for < x_values[0], > x_values[-1] like you say.

JanCBrammer commented 4 years ago

Thanks for visualizing the options! That makes it really clear how misleading the extrapolation could be: imagine someone averages this time series (point estimate) without visualizing it first :scream: I'll try to find some time to add a manual fill_value for the monotone cubic spline method.

JanCBrammer commented 4 years ago

We can manually extrapolate with [y_values[0]] and [y_values[-1]] for < x_values[0], > x_values[-1] like you say

@Tam-Pham, implemented in https://github.com/neuropsychology/NeuroKit/pull/244/commits/dbdb4025aaadeee0e901ceb49e3f328e59caa388.

I'll close this issue as soon as #244 is merged.