Open chintanr97 opened 3 years ago
I can observe that the Jarque-Bera test output is close to the approximations from the model.
The model is not reporting an approximation. In a state space model, diagnostic tests should be performed on the standardized residuals (model.standardized_forecasts_error
), and this is why you are getting a different result when you run the diagnostics on model.resid
.
I did not understand why we are using the
lag=1
The practical reason is just pragmatic - we have a limited space in the summary
output, and we have to show something. But still, this is a good point, and lag=1
is not the best thing to show. Harvey (1989, section 5.4.2) suggests using log(T)
or sqrt(T)
, so maybe we should report lag=int(np.log(self.nobs_effective))
Also, I am confused about the theoretical understanding of the
nobs_diffuse
value used in thetest_serial_correlation
function in same class. Any pointers would be really helpful!
Here is an intuitive way to understand nobs_diffuse
: when you have an integrated model (i.e. ARIMA(p, d, q) with d > 0), then a typical way to proceed is to difference the data d
times. When you do this, you lose the first d
observations from the original series, and so the residuals from these observations obviously cannot be included in statistical tests. When put in state space form, the model does not literally difference the data, but the states associated with the first few observations still have a "diffuse" distribution (see Durbin and Koopman [2012], sections 5.1 - 5.2 for rigorous details about the diffuse observations and how nobs_diffuse
is computed) and so they should not be included in statistical tests (see ibid. section 7.5 for details of diagnostic tests in state space models). So nobs_diffuse
gives the number of observations with diffuse state distributions.
Thanks a lot @ChadFulton, this was really helpful! I will read over the referenced articles too!
I am using an ARIMA model to predict the time series values. I identify the best fit ARIMA model using the AIC value and it turns out that for all the different orders that I tried, the best AIC is returned for the ARIMA order
(4, 0, 1)
.As I am using
statsmodels.tsa.arima.model.ARIMA
package, I could see the description of the fitted model usingmodel.summary()
and I could observe the following results for tests:I also manually ran the test using the
statsmodels.stats.diagnostic.acorr_ljungbox
andstatsmodels.stats.stattools.jarque_bera
packages respectively. I am using the following code snippet for the same:where,
model
is my best-fit ARIMA model andp_
andq_
are the respective orders of the model. The output from the snippet:I can observe that the
Jarque-Bera test
output is close to the approximations from the model. However, it looks like theLjunge-box test
output is significantly small and does not match with the output from model. I am not sure if I am mis-interpreting the output from the model description versus the output from theacorr_ljungbox
function.However, when I changed my code to use
lbTest = acorr_ljungbox(model.resid, lags=[1], boxpierce=False)
instead of above snippet, I got values very close to the estimates from the model:LBTest: (array([0.000103]), array([0.9919025]))
. I did not understand why we are using thelag=1
as found in the class here.Also, I am confused about the theoretical understanding of the
nobs_diffuse
value used in thetest_serial_correlation
function in same class. Any pointers would be really helpful!I am using
statsmodels
version0.12.2
.