skforecast / skforecast

Time series forecasting with machine learning models
https://skforecast.org
BSD 3-Clause "New" or "Revised" License
1.11k stars 131 forks source link

Improve how prediction intervals are estimated in forecasters using the direct strategy #817

Open JoaquinAmatRodrigo opened 2 weeks ago

JoaquinAmatRodrigo commented 2 weeks ago

Discussed in https://github.com/JoaquinAmatRodrigo/skforecast/discussions/815

Originally posted by **MarcosBelver** October 8, 2024 I saw in the documentation that the n_boot parameter of the predict_interval function in the DirectForecaster is set to 500 by default. I was wondering what is the mathematical/statistical explanation for this default selection. For example, if my set of residuals is ~150 (the training size is equal to that length) , does it make sense to oversampling the residuals obtaining a collection of 500 in order to get the corresponding quantile from there? Or this size of 500 is more oriented for longer training sets where the number of residuals is much bigger? In that case, how to select the proper n_boot parameter for smaller residuals sample? Thank you in advance.