Open EraylsonGaldino opened 4 years ago
Historical target values (e.g. y(t-1), y(t-2), are used as features to predict the future target value. For example, if you build a NARX(RandomForestRegressor(), auto_order=2, exog_order=[1], exog_delay=[0]), the predicted value y(t + 1) = f(y(t), y(t - 1), x(t)), the target value y is used to get y(t) and y(t - 1). It won't cheat to use any future values in the prediction. I know it looks a bit odd to input the target values, but it is necessary.
Please see this FAQ https://github.com/jxx123/fireTS#faq for more detailed explanation.
Hi @jxx123 , I still do not understand the concept of using the target variable in the predict function. At time (t), I want to make prediction/forecast for time (t+1),but it does not do that, it rather predicts for time (t) only using time t value and time (t-1,t-2 etc), which is not useful from timeseries forecast point of view, since at the present time I want to forecast for future, not what is happening at that time t itself.
for example, here is the output of the predict from the model, where the last timestamp of ypred is same as ytest, which makes sense, but what is confusing is that the prediction at time (t) is same ytest(t), not ytest(t+1) or ytest(t+6), this is doing what any supervised model will do, that is, "on-point" prediction at the time t, unless I convert the timeseries data to supervised learning format (which means shifting the target variable to selected lag period- x(t-1),x(t),y(t-1) as an input to the model predict for y(t+1), check here - https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/ ,), it does not makes sense to me. Does the model expect the input and output to be prepared in format mentioned above in the link, if not, the example notebooks do not provide explanation for it, except the model parameters supposedly learning from lagged input:
2020-10-06 11:15:00 -0.002697 2020-10-06 11:30:00 -0.003967 2020-10-06 11:45:00 0.000830 2020-10-06 12:00:00 0.002199 2020-10-06 12:15:00 0.002574
essentially the prediction at 2020-10-06 11:15:00 should be equal to the actual value at 2020-10-06 11:30:00, and therefore prediction at 2020-10-06 12:15:00 should be for 2020-10-06 12:30:00, but that's not the case , and if it is not, then this is really NOT timeseries forecasting but simple supervised machine learning prediction, if we need to modify the data format to create lags, I am having hard time understanding it.
Can you provide more clarification, I am using this model from your example notebook using grid search-
tsmdl = NARX(auto_order=6, base_estimator=SVR(C=100, epsilon=0.015, gamma=0.003),exog_delay=[0, 0], exog_order=[3, 3]) tsmdl.fit(Xtrain, ytrain)
ypred = tsmdl.predict(Xtest, ytest, step=6) ypred = pd.Series(ypred, index=ytest.index)
Thanks!
@neerajnj10 sorry for the late reply. The prediction is actually as what you expect, for example, if the prediction step is 6, it is predicting say y(10) based on y(4), y(3) (if the auto_order is 2). I just aligned the predicted value and the actual value nicely (so that it is easier to compute MSE score etc.), for example, the output yrped
has the same shape as ytest
, and ypred[10]
is the predicted value for ytest[10]
. Note that ypred[10]
is only based on the ytest[4], ytest[3]
.
Since I aligned the prediction with the actual value, you will notice that the first pred_step + max(auto_order - 1, max(exog_order + exog_delay) - 1)
values of the output yrped
is NaN, because the first several steps of prediction are not available due to missing information.
For more details, see my documentation here https://firets.readthedocs.io/en/latest/models.html#models.NARX.predict
Why the forecast method need the target value? mdl1.predict(x, y, step=3)