Open anne-devries opened 1 year ago
@dennisbader @madtoinou as discussed, I will work on this!
Hi @anne-devries, and thanks for taking a go at this.
We should rather have dedicated minimum length requirements for target, past and future covariates.
Otherwise, it can happen that we require more target time steps than actually required (for example with lags=[-1], lags_past_covariates[-1, -2]).
Hi @dennisbader , I don't really get why. When you instantiate a model with e.g. output_chunk_length = 2, lags = 2 and lags_past_covariates = 5, then the min_train_series_length would become 5, while to be able to train the model, we actually need 8. Also, I couldn't figure out where in the darts library this min_train_series_length property is used (it's not used as a check as far as I could find), could you clarify that for me? Thanks!
Hi @anne-devries, sure, let me explain: Darts models handle target and past/future covariates slicing under hood.
input_chunk_length
and output_chunk_length
and by convention we expect them to share the same time index.Let's look at an example:
Ex1: lags = [-1], lags_past_covariates = [-1], output_chunk_length = 1
min_len_past_covariates = min_len_target - 1 + 0 = min_len_target - 1
Ex1: lags = [-1], lags_past_covariates = [-2], output_chunk_length = 1
min_len_past_covariates = min_len_target + 1 - 1 = min_len_target
We want to get the minimum required time spans per target/covariates rather than a global minum, because sometimes covariates are only available up to a specific points, and we want to allow for a maximum trainable time window.
Regarding where it's used: it's used for example in the fit methods as a sanity check that the series is long enough (we can also do this check for covariates), also in ForecastingModel.residuals()
, ... . If you have an IDE, you can look for all occurences of the attribute in the code.
Hi Dennis, I think I now understand what you mean. So I will have a look and try to implement it for those 3 separately.
Describe the bug the min_train_series_length for lgbm, catboost, xgboost and regression_model, at the moment only considers target lags and output chunk length. However, this definition should also include the past_covariates_lags. E.g. -max(self.lags["target"][0], self.lags["past_covariates][0]) + self.output_chunk_length instead of -self.lags["target"][0] + self.output_chunk_length
Additional context link to gitter conversation: https://matrix.to/#/!uumevxjBaNJhovFYgj:gitter.im/$w4d7QoL4FaF3wXfxx0HeG9_iK5Rey5AfpY0kgazi9Ac?via=gitter.im&via=matrix.org&via=matrix.thegolem.cz