intive-DataScience / tbats

BATS and TBATS forecasting methods
MIT License
178 stars 19 forks source link

Forecast with new data part of a seasonality unit #4

Closed aurelien-roy closed 5 years ago

aurelien-roy commented 5 years ago

Hello,

I have a hourly sampled dataset with a daily (24 hour) seasonality.

If I train the model with N complete days of data and I use ‘forecast’, the predictions will start, as expected, from the day N+1 at midnight.

Now, suppose I just got new data for the next 4 hours. I want to give these 4 additionnals hours to my model (without re-training it, keeping the same parameters) to predict values for day N+1, but starting 4:00 AM.

To give these new hours, I just call the ‘fit’ function of my model, with every sample to day 1 at 0:00 AM to day N+1 at 3:00 AM. However, when I call ‘forecast’, I got abnormally large MSE and the model seems to consider it is making prediction for a new day, starting 0:00 AM.

est = TBATS(seasonal_periods=[24], ...)
fitted = est.fit(data['2016-04-01 00:00':'2016-05-01 23:00’]))
fitted.forecast(steps = 10) # These forecast are very close to the real values

# Now let’s feed the same model with 4 additionnal hours
fitted.fit(data['2016-04-01 00:00':'2016-05-02 03:00’])) # We feed the first 4 hours of day N+1
fitted.forecast(steps = 10) # These forecasts are abnormaly biased

# If I had fed the model with 24 additionnal hours (a complete seasonality period), the problem would not occur :
fitted.fit(data['2016-04-01 00:00':'2016-05-02 23:00’])) # We feed the complete day N+1 (May 2)
fitted.forecast(steps = 10) # These forecast are accurate

The original R package doesn’t have this issue, as it uses the number of samples to infer at which moment inside the seasonality the first prediction will be.

aurelien-roy commented 5 years ago

Closed. It turns out my data were more predictable at midnight. This is not a package issue. My apologies.

cotterpl commented 5 years ago

No problem. If you have any concerns using the package feel free to contact me.