markrogoyski / math-php

Powerful modern math library for PHP: Features descriptive statistics and regressions; Continuous and discrete probability distributions; Linear algebra with matrices and vectors, Numerical analysis; special mathematical functions; Algebra
MIT License
2.34k stars 239 forks source link

Interpolation NaturalCubicSpline issue #239

Open fox91 opened 7 years ago

fox91 commented 7 years ago

Using timestamp as X coord of points set give me back unexpected results (tested on version 0.31.0, php 7.1.6).

How to reproduce:

require_once(__DIR__ . '/vendor/autoload.php');
use MathPHP\NumericalAnalysis\Interpolation;

$points = [
    [1499173200, 1.07],
    [1499176800, 1.6],
    [1499180400, 1.0],
    [1499184000, 0.31],
];
$p = Interpolation\NaturalCubicSpline::interpolate($points);

$p(1499173200) // 0.0 (1.07 expected)
$p(1499176800) // 0.0 (1.6  expected)
$p(1499180400) // 4.0 (0.31 expected)

I'm using this workaround but is not a long term solution:

require_once(__DIR__ . '/vendor/autoload.php');
use MathPHP\NumericalAnalysis\Interpolation;

$b = 1499173200;
$points = [
    [(1499173200 - $b)/60, 1.07],
    [(1499176800 - $b)/60, 1.6],
    [(1499180400 - $b)/60, 1.0],
    [(1499184000 - $b)/60, 0.31],
];
$p = Interpolation\NaturalCubicSpline::interpolate($points);

$p((1499173200 - $b)/60) // 1.07
$p((1499176800 - $b)/60) // 1.6
// ...
Beakerboy commented 7 years ago

I'm guessing it's due to floating point errors on the rather large x values you have. I copied the algorithm from the natural cubic spline function into Excel and used your values, and got zeroes for everything. When I subtracted 1499173200 from all the x values, it worked fine. In fact, just dividing X by 10 allowed everything to work in MS Excel. Instead of using the unix timestamp as-is, or subtracting a "zero" and diving by 60, can you try just diving by 100?

Edit: Nevermind...that's not it. i was using an offset that was 10 times small, not dividing the value of x by ten. It looks like it might be an issue with the interval being too narrow, maybe? In Excel, if the "zero offset" is kept at 1499173200, but the data points are separated by 36000 instead of 3600, it works.

markrogoyski commented 7 years ago

Hi @fox91 Thank you for bringing this issue to our attention. And thank you for the test data to reproduce the issue. I'm sorry the interpolate method is not working as expected. We'll try to get to the bottom of this.

@Beakerboy Thanks for looking into this. There are a lot of touch points: NaturalCubicSpline, Polynomial, Piecewise. I'll look into writing some more unit tests for NaturalCubicSpline and Piecewise to see if I can narrow in on what the issue is, or at least when it does and does not occur.

@jakobsandberg Do you have any insight into why invoking the Piecewise function at the original points does not seem to return the original y values in this case? Thanks.

markrogoyski commented 7 years ago

I need to do more investigation, but I think the issue may be a mix of not handling zero-like numbers properly, and doing arithmetic on really big numbers. At one point in the Natural Cubic Spline algorithm the x point value gets cubed, so you have some calculations like 1499173200**3, which result in floating point values like 3.3694221756269E+27.

Beakerboy commented 7 years ago

Would it make sense to scale the input in the constructor to minimize loss of precision when the splines are created? For example, create a transformation matrix that would scale the inputs to only range from -1000 to +1000 on the y and x axis and create the spline. Then any input can be scaled down, evaluated against the spline, and transformed back to the original scale. The only hitch is, we'd have to verify that:

I would imagine it does...but we would have to prove it. Since a cubic spline is the result of tridiagonal matrix algebra, it might not be hard to prove it...and the object might be easier to manage if it were refactored to use tridiagonal matricies anyways.

markrogoyski commented 7 years ago

If it turns out there is no way to fix this issue, then scaling when you have large x values seems like a good idea.

If it ends up being difficult to prove, another option is to simply gain confidence that it works with an abundance of unit tests.