Open Apolsus opened 7 months ago
After several attempts, I determined that the problem was that there were some problems with auto-arima's search. Even for the same sequence, different best model search results may appear under different batch combinations. Here is a minimal example.
ts = [
[128, 88, 228, 357, 376, 448, 558, 521, 395, 211, 294, 253, 314, 272, 314, 295, 432, 565, 583, 473, 343, 195, 200, 255, 371, 189, 269, 272, 352, 374, 433, 296, 182, 187, 136, 82, 113, 105, 217, 203, 416, 504, 488, 519, 424, 263, 263, 220, 237, 160],
[214, 1351, 965, 848, 362, 273, 176, 187, 163, 201, 207, 173, 239, 137, 233, 231, 248, 160, 200, 222, 160,219, 173, 170, 225, 212, 216, 279, 197, 174, 163, 184, 149, 169, 178, 181, 212, 177, 172, 224, 234, 225, 204, 165, 178, 163, 140, 138, 125, 175],
]
ys = cupy.array(ts)
ys = ys.T
model = AutoARIMA(ys)
model.search(s = 0, d=(0, 1, 2), p=range(7), q=range(4), method="auto")
model.fit()
model.forecast(2)
ts = [
[214, 1351, 965, 848, 362, 273, 176, 187, 163, 201, 207, 173, 239, 137, 233, 231, 248, 160, 200, 222, 160,219, 173, 170, 225, 212, 216, 279, 197, 174, 163, 184, 149, 169, 178, 181, 212, 177, 172, 224, 234, 225, 204, 165, 178, 163, 140, 138, 125, 175],
]
ys = cupy.array(ts)
ys = ys.T
model = AutoARIMA(ys)
model.search(s = 0, d=(0, 1, 2), p=range(7), q=range(4), method="auto")
model.fit()
model.forecast(2)
@Nyrio I noticed that you are the main contributor of this code, could you provide some help and mark this as a feature or I can also try to fix this bug.
This problem is further confirmed to be that when ARIMA(method='ml'), the fitting results of batch input and individual input are different, but method='css' does not have this problem.
@Apolsus thanks for the issue and reproducer! Would using the css
method suffice for you for now? We will look into the bug in ml, but not sure of an ETA for a fix
no, css method will also cause this in a much larger batchsize (10000). I checked the source code. In theory, the parameter optimization process is performed independently for each sequence, but some sequences do not converge in batch prediction, but converge in individual prediction. I think one possible place is the parameter initialization process, I haven't had time to check yet.
After looking into it (and with help from @Nyrio ) this seems to stem from numerical stability issues particularly around different code paths for different batch sizes. It might take some time to create workarounds or fixes in general, but we will try to look into it as soon as we can.
Describe the bug I'm having some issues using cuML's Auto ARIMA model for large-scale time series forecasting. Specifically, when I tried to do a batch forecast on about 50,000 time series data, I got some unusually high values in the forecast results. However, when I select the unusually sequence from these data and predict it alone, I can get normal prediction results. Steps/Code to reproduce bug Hard to discribe it here, the data is private and large. One of the sequence is: [224, 69, 115, 94, 59, 63, 60, 52, 87, 118, 132, 149, 139, 89, 97, 115, 98, 82, 55, 77, 96, 133, 112, 92, 170, 128, 94, 84, 63, 75, 56, 77, 85, 121, 126, 101, 197, 98, 89, 71, 72, 30, 47, 73, 69, 106, 110, 128]
batch prediction gives '7405687891.374923'
Expected behavior The results of individual predictions and batch predictions should be the same. Environment details (please complete the following information):