[FEATURE] Non-continuous index in predict

MichalChromcak commented 3 years ago

Is your feature request related to a problem? Please describe. Sometimes it might be handy to be able to pass an index for predict methods, which would have gaps. E.g. wanting only forecasting horizon of 1d, 3d, and 7d ahead.

Describe the solution you'd like

X_pred = pd.DataFrame(index=pd.DatetimeIndex(["2020-01-01", "2020-01-03", "2020-01-07"]))
preds = model.predict(X_pred)
assert X_pred.index.to_list() == preds.index.to_list()

Some models support that natively, for some models, the only way might be predicting for full range and later select from the results only X_pred values.

full_ind = pd.DatetimeIndex(X_pred.index.min(), X_pred.index.max())
preds = model.predict(pd.DataFrame(index=full_ind).merge(X_pred, left_index=True, right_index=True))
return preds.loc[X_pred.index.to_list()]

SklearnWrapper might just pick exact things (thus support list of concrete lags) - especially handy when having optimize_for_horizon set on True

Describe alternatives you've considered Make it uniform for all models, while "wasting" computational time, but having just one implementation.

Additional context Request partially raised in order to be compliant with sktime, but we could also discuss its usefulness in general.

MichalChromcak commented 3 years ago

@pavelkrizek Could you provide your opinion on the feature? The result of our discussion will determine the next steps for HCrystalBallForecaster in sktime implemented in this PR

pavelkrizek commented 3 years ago

@MichalChromcak Is this necessary for sktime wrapper? In principle, somebody who cares just about some particular horizon could achieve the same thing by just passing custom metrics to CV, which will i.e. exponentially weigh the error terms based on time - this way gets the best model just for a particular horizon and they could filter the result themselves.

I also see bigger changes needed than just filtering out the index in the predict method. In order to make it useful, we would also need to change things in cross-validation, i.e. our Splitter is not supporting right now returning just one sample with the arbitrary horizon (horizon 10, gives you 10 samples for predict), also our CV plotting will not be probably very meaningful with these settings, so having a better picture of how many things needs to be changed to make it usable would be a better starting point (would propose to change 1 wrapper and try to use it in different contexts to see what breaks...). Some scouting of methods that are already supporting this out-of-box would be also useful to see if there are some performance benefits of passing the filtered data to predict or just post-filtering in order to decide if one uniform solution "fits all".

MichalChromcak commented 3 years ago

@pavelkrizek At least from the test suite, there are some cases when data is passed with indices including gaps. We might also raise NotImplementedError in there, or maybe better directly in HCrystalBall if such things occur.

@mloning Would be acceptable to raise NotImplementedError in such case for sktime?

mloning commented 3 years ago

Hi @MichalChromcak, yes that would be okay, we may have ignore any failing unit test in sktime for the HCrystalball wrapper.

heidelbergcement / hcrystalball

[FEATURE] Non-continuous index in predict #50