signaflo / java-timeseries

Time series analysis in Java
MIT License
195 stars 49 forks source link

NPE #7

Closed XciD closed 2 years ago

XciD commented 6 years ago

Got NPE on simple code:

public static void main(String[] args) {
        ArimaOrder arimaOrder = ArimaOrder.order(0, 1, 1, Arima.Constant.INCLUDE);
        Arima model = Arima.model(TimeSeries.from(new double[]{
            100.1,
            200.4,
            300.2,
            400.5,
            500.1,
            600.7,
            700.7,
            800.7
        }), arimaOrder);
        Forecast forecast = model.forecast(5);
        System.out.println(forecast);
    }

After searching: https://github.com/signaflo/java-timeseries/blob/master/timeseries/src/main/java/com/github/signaflo/data/regression/MultipleLinearRegressionModel.java#L254

You don't check the return boolean value, that can be false...

signaflo commented 6 years ago

@XciD , thank you for the reproducible example and the research into the root cause. I'll get on this one asap.

signaflo commented 6 years ago

Dev note: R Arima has no problem fitting this dataset and model order combination.

XciD commented 6 years ago

Indeed :) I'm trying to reproduce a r code into java for better performance. I've implemented a simplified autoarima, do you know if I can found ma and ar roots ?

Also, I don't know why I haven't the exact same result with the java code and the r code. I will provide you an exemple on monday.

signaflo commented 6 years ago

@XciD , I realized that the underlying issue is that when you tell the R forecast package to fit an Arima model with one difference and a constant, it interprets the constant as a drift term. The base arima method in the stats package just ignores you and doesn't fit a constant at all. The problem is that I'm interpreting it as the model "mean", which is like a global level, which doesn't make sense for a nonstationary model.

In the current release, I don't allow clients to build an ArimaCoefficients object with both a differencing term and a mean term. I throw an exception and tell the client to specify that they want a drift term instead.

I'm considering two options for this situation. The first is to throw an exception when you try to build an ArimaOrder with a differencing term and a constant with a corrective action message to specify a drift term instead, just as the ArimaCoefficients object does.

The second option is to do what R does and assume the client wants a drift term. I can log a warning and then set the drift to include and exclude the constant.

Let me know if you have an opinion. I'm leaning towards doing what R does and just assuming the client wants that constant to be interpreted as a drift term. Then if you try to include a drift term, a constant, and a degree of differencing, I'll throw an exception to let you know that's an impossible model combination.

signaflo commented 6 years ago

@XciD , yes, I can compute the roots and check for invertibility, stationarity, and causality. I did this for the spark-timeseries project here: https://github.com/sryza/spark-timeseries/pull/163/commits

XciD commented 6 years ago

IMO you should try to have the same behaviour. A lot of people concerning for performances will find your library, and they may just want have a result with the "same algorithm" So I think the log warn may be the best solution.

For your information, I've implemented an python with arima, and autoarima algorithm: one request was about 400ms I played with your lib (0.2.1), and a was near 15ms for the same result. I just break an infinite loop and found the update :) and now I'm stuck with the NPE. That I could catch, but I will wait for your fix :)

If I can help you, don't hesitate

signaflo commented 6 years ago

That's actually really surprising. I wouldn't expect that much of a difference. Thank you for letting me know.

And I agree completely. The behavior should be as consistent as possible to avoid too much confusion.

XciD commented 6 years ago
@Test
public void testRModel() {
    double[] serie = {
        0.4,
        1.03,
        1.14,
        1.25,
        2.74,
        2.62,
        2.42,
        7.64,
        11.73,
        18.36,
        27,
        42.19,
        32.13,
        0.95,
        1.28,
        3.64,
        3.476,
        2.033,
        0.635,
        0.62,
        0.679
    };

    Arima model = Arima.model(
        TimeSeries.from(serie),
        ArimaOrder.order(0, 1, 1, Arima.Constant.EXCLUDE, Arima.Drift.EXCLUDE)
    );

    System.out.println(model.aic());
    List<Double> forecast = model.forecast(5).pointEstimates().asList();

    assertEquals(142.7845474966764D, model.aic(), 1e-6);

    assertEquals(Lists.newArrayList(
        0.9788615,
        0.9788615,
        0.9788615,
        0.9788615,
        0.9788615
    ), forecast);
}

There is a test, where I don't have the same forecast. By the way I find the same aic...

signaflo commented 6 years ago

@XciD , I see what's happening. R has some way of automatically forcing parameters to stay within the invertibility range. The MA parameter java-timeseries is giving is 1.5678, which isn't invertible. R forces it to stay below 1.

edit: Notice that R outputs 0.6376 as the MA parameter, which happens to be about 1/1.5678. An MA(1) model with parameter c is asymptotically equivalent to an MA(1) with parameter 1/c, but in the short term they yield different forecasts. The good news is that any non-invertible MA model can rewritten as an invertible one, but I need to figure out the exact process of how to do that.

signaflo commented 6 years ago

@XciD , obviously, that's functionality that we don't have yet so we'll need to add it.