Closed XciD closed 2 years ago
@XciD , thank you for the reproducible example and the research into the root cause. I'll get on this one asap.
Dev note: R Arima has no problem fitting this dataset and model order combination.
Indeed :) I'm trying to reproduce a r code into java for better performance. I've implemented a simplified autoarima, do you know if I can found ma and ar roots ?
Also, I don't know why I haven't the exact same result with the java code and the r code. I will provide you an exemple on monday.
@XciD , I realized that the underlying issue is that when you tell the R forecast package to fit an Arima model with one difference and a constant, it interprets the constant as a drift term. The base arima method in the stats package just ignores you and doesn't fit a constant at all. The problem is that I'm interpreting it as the model "mean", which is like a global level, which doesn't make sense for a nonstationary model.
In the current release, I don't allow clients to build an ArimaCoefficients object with both a differencing term and a mean term. I throw an exception and tell the client to specify that they want a drift term instead.
I'm considering two options for this situation. The first is to throw an exception when you try to build an ArimaOrder with a differencing term and a constant with a corrective action message to specify a drift term instead, just as the ArimaCoefficients object does.
The second option is to do what R does and assume the client wants a drift term. I can log a warning and then set the drift to include and exclude the constant.
Let me know if you have an opinion. I'm leaning towards doing what R does and just assuming the client wants that constant to be interpreted as a drift term. Then if you try to include a drift term, a constant, and a degree of differencing, I'll throw an exception to let you know that's an impossible model combination.
@XciD , yes, I can compute the roots and check for invertibility, stationarity, and causality. I did this for the spark-timeseries project here: https://github.com/sryza/spark-timeseries/pull/163/commits
IMO you should try to have the same behaviour. A lot of people concerning for performances will find your library, and they may just want have a result with the "same algorithm" So I think the log warn may be the best solution.
For your information, I've implemented an python with arima, and autoarima algorithm: one request was about 400ms I played with your lib (0.2.1), and a was near 15ms for the same result. I just break an infinite loop and found the update :) and now I'm stuck with the NPE. That I could catch, but I will wait for your fix :)
If I can help you, don't hesitate
That's actually really surprising. I wouldn't expect that much of a difference. Thank you for letting me know.
And I agree completely. The behavior should be as consistent as possible to avoid too much confusion.
@Test
public void testRModel() {
double[] serie = {
0.4,
1.03,
1.14,
1.25,
2.74,
2.62,
2.42,
7.64,
11.73,
18.36,
27,
42.19,
32.13,
0.95,
1.28,
3.64,
3.476,
2.033,
0.635,
0.62,
0.679
};
Arima model = Arima.model(
TimeSeries.from(serie),
ArimaOrder.order(0, 1, 1, Arima.Constant.EXCLUDE, Arima.Drift.EXCLUDE)
);
System.out.println(model.aic());
List<Double> forecast = model.forecast(5).pointEstimates().asList();
assertEquals(142.7845474966764D, model.aic(), 1e-6);
assertEquals(Lists.newArrayList(
0.9788615,
0.9788615,
0.9788615,
0.9788615,
0.9788615
), forecast);
}
There is a test, where I don't have the same forecast. By the way I find the same aic...
@XciD , I see what's happening. R has some way of automatically forcing parameters to stay within the invertibility range. The MA parameter java-timeseries is giving is 1.5678, which isn't invertible. R forces it to stay below 1.
edit: Notice that R outputs 0.6376 as the MA parameter, which happens to be about 1/1.5678. An MA(1) model with parameter c is asymptotically equivalent to an MA(1) with parameter 1/c, but in the short term they yield different forecasts. The good news is that any non-invertible MA model can rewritten as an invertible one, but I need to figure out the exact process of how to do that.
@XciD , obviously, that's functionality that we don't have yet so we'll need to add it.
Got NPE on simple code:
After searching: https://github.com/signaflo/java-timeseries/blob/master/timeseries/src/main/java/com/github/signaflo/data/regression/MultipleLinearRegressionModel.java#L254
You don't check the return boolean value, that can be false...