scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
59.01k stars 25.18k forks source link

Heteroskedastacity-Aware Covariance estimators #21319

Open aleksejs-fomins opened 2 years ago

aleksejs-fomins commented 2 years ago

Describe the workflow you want to enable

Scikit-learn provides multiple covariance estimators useful for different purposes. Heteroskedastacity-Aware Estimators are designed for correcting bias due to autocorrelation in data, and thus are extremely useful when estimating correlation between time series. To the best of my understanding, such an estimator is not currently available

Describe your proposed solution

Implement covariance Andrews and Newey-West estimators https://www.jstor.org/stable/pdf/2938229.pdf https://en.wikipedia.org/wiki/Newey%E2%80%93West_estimator

Describe alternatives you've considered, if relevant

To some extent, the methods are implemented in statsmodels package

Additional context

No response

GaelVaroquaux commented 2 years ago

I'm sorry, I do fear that this is going to be considered as out of scope. We are focusing more and more on objects that can be useful in a machine-learning pipeline.

aleksejs-fomins commented 2 years ago

@GaelVaroquaux I don't follow. How is the proposed HAC covariance estimator more out-of-scope than the already implemented metrics such as Robust Covariance. Or do you suggest that dealing with time series analysis is out of scope of machine learning?

GaelVaroquaux commented 2 years ago

How is the proposed HAC covariance estimator more out-of-scope than the already implemented metrics such as Robust Covariance.

I'm not sure that we would include them today :).

Or do you suggest that dealing with time series analysis is out of scope of machine learning?

Yes, this is indeed the case.

aleksejs-fomins commented 2 years ago

I see. I guess the definition of ML I was assuming is somewhat different. But that is beyond the point. Your answer clarifies the direction that scikit-learn wants to take, this is already very helpful

Thanks for your reply

yuvalyitz commented 2 years ago

For what its worth, @aleksejs-fomins's feature suggestion is relevant for me. An estimation of a dependent variance could be very useful in risk management. For now I use a two step workaround as in [1] . Hope that this feature is considered soon!

[1] Most likely heteroscedastic Gaussian process regression ICML '07: Proceedings of the 24th international conference on Machine learning June 2007 Pages 393–400https://doi.org/10.1145/1273496.1273546