Use statsmodels for VAR estimation

cbrnr commented 5 years ago

Maybe we should use http://www.statsmodels.org/dev/vector_ar.html instead of rolling our own implementation.

mbillingr commented 5 years ago

I tentatively agree :) At the time we wrote scot this was not yet available in statsmodels, but now it would certainly make sense to have a known and trusted VAR estimator under the hood.

I have two concerns, though:

In my work I used regularization rather than order selection during fitting, which is mirrored in the design of the library. A quick text search on the statsmodels documentation revealed no feature of that sort. Scot might certainly work without regularization, but would we lose an important feature?
The statsmodels documentation states the assumption that the VAR process is driven by Normally distributed noise. MVARICA assumes that the VAR residuals are not Normally distributed (for the ICA decomposition to make sense). These two assumptions are essentially incompatible. I don't think it matters in practice (least squares does not care about the distribution) but any statistics lawyer would have fits (yes, I'm looking at you, reviewer#3).

cbrnr commented 5 years ago

Good points. It's been quite some time since I last looked at the code, but whereas normal VAR estimation uses OLS, we use regularized least squares (ridge, lasso, elastic net). I have neither found anything on VAR estimation using RLS in the literature nor an implementation in a software toolbox, so this might really be a unique feature. Regarding the assumption of normality, indeed OLS (and therefore also RLS) don't require this, but VAR models usually do. But we have the same problem in our own implementation, don't we?

mbillingr commented 5 years ago

I think the assumption of normality serves mostly the purpose of making the math simpler when theorizing about (V)AR models. In addition, it provides another criterion for model validation: If after fitting the residuals are not normally distributed the model is not adequate.

Other statistical inferences on (V)AR models might depend on the normality assumption (or might be proved only for normality) but I was never particularly interested in those :) Anyway, I don't think we have a problem in our implementation. We simply do not assume normality in the residuals (but we do assume independence). Therefore, we are forbidden to do anything with the model that depends on this assumption but we gain the ability to decompose the residuals with ICA.

I'm not saying statsmodels would not work for us. On the contrary, I fervently hope their implementation behaves similar to ours. However, they state in the API documentation that the model has normal residuals, so "legally" we cannot use ICA on such a model.

cbrnr commented 5 years ago

Anyway, I don't think we have a problem in our implementation. We simply do not assume normality in the residuals (but we do assume independence). Therefore, we are forbidden to do anything with the model that depends on this assumption but we gain the ability to decompose the residuals with ICA.

I don't really understand where exactly the assumption of normal (or non-normal) residuals actually appears in the estimator. IMO non-normal residuals are the result of a non-optimal model fit, which is what happens in the case of MVARICA. It's not specified anywhere AFAIK. The VAR estimator we use and the one from statsmodels are probably both based on least squares, which doesn't assume normal residuals. According to Lütkepohl, the assumption of normal noise is required to construct forecast intervals, which we don't need (and non-normal noise does not necessarily invalidate a VAR model).

I don't think we have a problem in our implementation either, but I also think that we could also use the statsmodels estimator. However, since we would lose regularization it is probably not such a great idea and I guess we should keep our own implementation.

mbillingr commented 5 years ago

IMO non-normal residuals are the result of a non-optimal model fit, which is what happens in the case of MVARICA.

I would avoid the "non-optimal fit" phrasing but otherwise I agree. We have a VAR estimator driven by a non-Gaussian excitation process, so even an optimal fit will not produce Normal residuals.

According to Lütkepohl, the assumption of normal noise is required to construct forecast intervals, which we don't need (and non-normal noise does not necessarily invalidate a VAR model).

Yes, i totally agree! Do you still have the book around? :)

scot-dev / scot

Use statsmodels for VAR estimation #198