.fit() messing up with the forecast

shahzebnaveed-telus commented 3 years ago

Hi,

In one the example scripts re_fit_model.py, it is mentioned that fitting the trained model again will not change model parameters. However, I see it to be messing up with the forecasts.

The image below shows original model.

I then take up one new observation from the test set and make it a part of training set and fit() the model again.

The MSE/MAE of both the training and test datapoints increase.

I take up another observation from the test set and re-fit() the model on it. The future forecasts become even worse.

Is this the expected behavior? Why does the model rely so much on the new observation being fitted? I observe another thing that maybe related to this issue. The forecasts almost always seem to be underestimating giving a residual that is not centered around 0 but a positive number.

cotterpl commented 3 years ago

Can you share the dataset and/or code you are doing those experiments on?

shahzebnaveed-telus commented 3 years ago

The sample data and the script:

tbats_example.zip

cotterpl commented 3 years ago

I have reviewed the code. It seems you are doing the following:

A) y_train_new = np.append(y_to_train_2, y_to_test_2[337])
then B) y_train_new = np.append(y_to_train_2, y_to_test_2[338])
then C: y_train_new = np.append(y_to_train_2, y_to_test_2[339])

It means you are adding to the same set y_to_train_2 only one observation that is 1 (A), 2 (B) or 3 (C) steps ahead. So the resulting sequence is not consecutive. For case C we are missing observations for 1 and 2 steps ahead. In order to properly fit you should be adding all observations in between. Sth like:

y_train_new = np.append(y_to_train_2, y_to_test_2[337]) # add 1 step ahead
y_train_new = np.append(y_train_new, y_to_test_2[338]) # and also add 2 steps ahead
y_train_new = np.append(y_train_new, y_to_test_2[339]) # and also add 3 steps ahead so that all steps are consecutive and present

I have added this like that and I am not experiencing the weird behaviour any more.

Ad: 'Why does the model rely so much on the new observation being fitted?'

The way TBATS works is that it recalculates state from observation to observation. If one of the observations is suddenly completely not matching what has been seen previously (as in case of B and C) it may 'corrupt' the state and result in such forecasts. This is a nature of such methods.

Ad: 'The forecasts almost always seem to be underestimating giving a residual that is not centered around 0 but a positive number.'

The analysis of this is beyond package support. Have you tried fitting the model to different lengths of training set (so that it ends at a different time point each time)? Are residuals positive on average each time?

shahzebnaveed-telus commented 3 years ago

Thank you for pointing out the mistake I was doing!

Yes, I've tried different lengths and different kinds of ending points. But regardless my training data's sequence ends on higher values or lower values, the estimate will always give me positive residuals. Can you suggest that how much time in the future should the forecasts be considered as valid in general?

cotterpl commented 3 years ago

I am afraid this question has no simple answer. It depends on many factors both from mathematical and business perspective.

intive-DataScience / tbats

.fit() messing up with the forecast #28