Closed corneliagru closed 4 years ago
i think we should include agruments for the prediction function as well, to ensure flexibility for the user. e.g. for forecast() h , level and so on
During the $predict call forecasts are made starting from the last timestamp in the training set. If the model was trained on the whole dataset, it is not clear what exactly should happen when calling $predict. Do we want to return the fitted values of the model?
Right now the forecast horizon for the prediction is set by the row_ids argument in the predict method. Confidence Intervals can be computed with a helper function from the standard errors. Are there more useful arguments for the prediction?
Right now the forecast horizon for the prediction is set by the row_ids argument in the predict method. Confidence Intervals can be computed with a helper function from the standard errors. Are there more useful arguments for the prediction?
all the arguments for the forecast function can be found here https://cran.r-project.org/web/packages/forecast/forecast.pdf on page 47
I think it should be possible to adjust all the arguments
During the $predict call forecasts are made starting from the last timestamp in the training set. If the model was trained on the whole dataset, it is not clear what exactly should happen when calling $predict. Do we want to return the fitted values of the model?
so the general problem is that when using only the packages it is possible to do something like this:
`library(forecast) library(tsbox)
mdeaths mod = auto.arima(mdeaths) forecast(mod) `
so you train your model on all observations and it is still possible to predict values afterwards
Do we want to return the fitted values of the model?
I think this would be very cool! This would allow us to see how good our model fits the training data.
Open Questions:
fitted
values?mlr3
resampling scheme, i.e. this is the analogon to predicting on train
data in tabular ml.So the difference is, that forecast
just forecasts starting from the last time-point in the training data, while we obtain this info from the test data.
Through $predict
we basically only support predicting data we already have, as $predict
always expects there to be data.
An experimental idea we might want to do is the following:
.$forecast = function(horizon = 5L) {
# 1. Get the last training time-point in the data
# 2. Create "artificial" data that has observations for `last_train_time` + horizon rows
# 3. Call self$predict_internal on this.
}
Open question: How would that look like with exogenous variables etc.?
This would allow me to call lrn$forecast(5)
and get predictions for e.g. 5 days in the future.
For actual forecasting (beyond the last timestamp in the training data): Does it make sense to store the forecasts in a Prediction object with "fake" data? Since the truth for those forecasts is not available and prediction$score will be misleading.
Well, we can not score it but I guess the purpose of any forecasting model is, to eventually forecast unseen data. As a result, I guess it is ok to create fake data, in the sense that we only extend the date colum
.
Fixed in PR #47 by Johannes
when training the learner on all rows, prediction is not possible anymore. (Error in learner$predict(task) : No timesteps left for prediction)
`
autoarima
learner = LearnerRegrForecastAutoArima$new() tsk = mlr_tasks$get("airpassengers") learner$train(tsk) learner$predict(tsk)
var
task = tsk("petrol") learner = LearnerRegrForecastVAR$new() learner$train(task) p = learner$predict(task) `