Closed kestlermai closed 2 months ago
statsmodels
. But even if the docs say it is maximum likelihood, there are many variations. R is using a state space representation with a diffuse prior as explained in the documentation for stats::arima()
: https://rdrr.io/r/stats/arima.html. Other objective functions may yield different results. See https://robjhyndman.com/hyndsight/estimation/StatsForecast
: https://nixtlaverse.nixtla.io/statsforecast/src/core/models.html#arimaThank you very much for your reply. When I tried to use the StatsForecast to build an ARIMA model, the results still differed significantly from those obtained by running R. Under the same parameters {order=(0, 1, 1), season_length=12, seasonal_order=(0,1,1)}, MAPE: is 4.922 in R and 14.463 in Python. This may be attributed to different software algorithms? Anyway, thank you very much for your help.
A MAPE difference that large suggests something's gone wrong in the Python model.
R 4.2.1; forecast 8.22.0 :
arima <- arima(train_data, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12))
summary(arima)
Series: train_data ARIMA(0,1,1)(0,1,1)[12]
Coefficients: ma1 sma1 -0.193 -0.791 s.e. 0.091 0.084
sigma^2 = 181: log likelihood = 37.83 AIC=-69.66 AICc=-69.45 BIC=-61.32
Python 3.11; statsmodels 0.14.1 :
model = SARIMAX(train_data['incidence'], order=(0,1,1), seasonal_order=(0,1,1,12))
result = model.fit()
print(result.summary())
SARIMAX Results Dep. Variable: incidence No. Observations: 132 Model: SARIMAX(0, 1, 1)x(0, 1, 1, 12) Log Likelihood -99.484 Date: Mon, 29 Apr 2024 AIC 204.969 Time: 23:46:06 BIC 213.306 Sample: 0 HQIC 208.354
Covariance Type:opg coef std err z P>|z| [0.025 0.975]
ma.L1 -0.6900 0.048 -14.322 0.000 -0.784 -0.596 ma.S.L12 -0.8250 0.102 -8.081 0.000 -1.025 -0.625 sigma2 0.2766 0.019 14.838 0.000 0.240 0.313
Ljung-Box (L1) (Q): 0.73 Jarque-Bera (JB): 438.41 Prob(Q): 0.39 Prob(JB): 0.00 Heteroskedasticity (H): 1.21 Skew: -0.82 Prob(H) (two-sided): 0.56 Kurtosis: 12.26
Using the same parameters in two different software packages results in drastically different model performances. For example, in R: log likelihood = 37.83, aic = -69.66; while in Python: Log Likelihood = -99.484, AIC = 204.969.
Can you help me?