Closed DominiqueMakowski closed 4 years ago
@DominiqueMakowski, I noticed this behavior of the interpolation while implementing ppg_simulate()
. I had difficulty finding words for this behavior, but now that it comes up again I did some research. The behavior we see is referred to a "overshooting" or "undershooting" of the interpolated values in the y direction. This happens when the interpolation method is not monotone.
These three resources help explain and visualize this.
The solution is to use methods that preserve monotonocity (i.e., that do not overshoot or undershoot the y values). These methods are linear interpolation, Akima interpolation (I use this one in ppg_simulate()
), or Monotone cubic spline interpolation. I suggest we use these methods instead of trying to correct or warn against over- / undeshooting. The latter could unnecessarily confuse users and clutters the code.
Thanks for the resources! Do you think we should replace the cubic default by one of these then? Or do they, on average, not perform better? (the goal of the default method being to have something that works the best on average, in the majority of cases and that is not too dangerous)
Yes, I suggest we simply replace the default cubic spline with monotone cubic spline. How would you quantify "better performance" / what are your criteria for "good"?
I think the goal of interpolation is to create realistic values (from a biological standpoint). From there, a result can be "unrealistic" if 1) the pattern is not biologically plausible (for instance, linear interpolation is not realistic for this aspect, as the body doesn't produce consecutive linear changes but rather a smooth transition (note that for rate it is indeed a very complex issue as the true "continuous" process behind it is hidden)) or 2) it has impossible values (e.g., negative values).
So I guess a good default method should aim at the sweet spot between mimicking a realistic "smooth" biological process and being safe and robust to artefacts (i.e., not creating crazy solutions)
Yes, I suggest we simply replace the default cubic spline with monotone cubic spline.
It seems like we don't have these options in signal_interpolate()
good default method should aim at the sweet spot between mimicking a realistic "smooth" biological process and being safe and robust to artefacts (i.e., not creating crazy solutions)
Both Akima and monotinic cubic spline do a good job in that sense. I find the issue of "biologically plausible" interpolation interesting (i.e, which "continuous" assumptions are we willing to make about a physiological process that is discrete, at least empirically). Do you have literature about that?
It seems like we don't have these options in
signal_interpolate()
We can wrap the corresponding methods from the scipy.interpolation
module. I could look into this tomorrow.
Do you have literature about that?
Not really from the top of my head, I remember reading several times "we interpolated with cubic as it is biologically plausible" or something like that, but I don't remember any specific investigation of that, but as I said it's a tricky issue, I reckon one would have to look at the neural correlates of heart/rsp rate modulations, and track it there to get the higher temporal resolution. I suppose physiologists/cardiologists must have been looking into it, I'll take a look
Ok, we now solved the problem of the under- or overshooting interpolation by using monotone cubic splince interpolation. One small problem remains with this. The method cannot extrapolate constant values which results in some crazy extrapolated values outside the interpolation range (< x_values[0], > x_values[-1]). See picture. But I don't think it's to0 difficult to set the extrapolated values to something else manually (e.g. all values < x_values[0] = x_values[0]).
I agree that the extrapolation
gives rather crazy values.
Below I tried to interpolate with different method:
mo
- interpolate with monotonic cubic spline
interpolation_function = scipy.interpolate.PchipInterpolator(x_values, y_values, extrapolate=True)
cubic
- interpolate with cubic spline (fill_value=y_values[0]], [y_values[-1]])
interpolation_function = scipy.interpolate.interp1d( x_values, y_values, kind=method, bounds_error=False, fill_value=([y_values[0]], [y_values[-1]])
cubic_extrapolate
- interpolate with cubic spline (fill_value="extrapolate")
interpolation_function = scipy.interpolate.interp1d( x_values, y_values, kind=method, bounds_error=False, fill_value="extrapolate")
We can manually extrapolate with [y_values[0]] and [y_values[-1]] for < x_values[0], > x_values[-1] like you say.
Thanks for visualizing the options! That makes it really clear how misleading the extrapolation could be: imagine someone averages this time series (point estimate) without visualizing it first :scream: I'll try to find some time to add a manual fill_value
for the monotone cubic spline method.
We can manually extrapolate with [y_values[0]] and [y_values[-1]] for < x_values[0], > x_values[-1] like you say
@Tam-Pham, implemented in https://github.com/neuropsychology/NeuroKit/pull/244/commits/dbdb4025aaadeee0e901ceb49e3f328e59caa388.
I'll close this issue as soon as #244 is merged.
Following #180 and #242 interpolation can create crazy values, extending far beyond the original range.
We could add a checker for that, throwing a warning for instance if the new range is extended by more than 25% (or anything else):
"Warning: the " + interpolation_method + " method for interpolation created some implausible values. Proceed with caution, visualize the resulting interpolated signal and consider switching to linear interpolation."
Additionally, we could add a
safe
orcap
argument that would, for instance, replace all interpolated values outside this "safe" zone (e.g., 125% of range) by either the max/min or by NaNs?@JanCBrammer thoughts?