model.predict gives different forecast depending on forecast_length

winedarksea / AutoTS

Automated Time Series Forecasting

MIT License

1.1k stars 100 forks source link

model.predict gives different forecast depending on forecast_length #199

Open sebros-sandvik opened 1 year ago

sebros-sandvik commented 1 year ago

model.predict(forecast_length=n) and model.predict(forecast_length=m) gives (for me) different forecast values depending on forecast length. What am I missing? Expected behavior, mistake on my part, or bug?

code:

model = AutoTS( forecast_length=3, frequency='MS', ensemble='all', model_list="best", n_jobs="auto", transformer_list="fast", holiday_country = "PL", max_generations=4, num_validations=1, verbose=0 ) model = model.fit(df, #Multiple ts end-date 2023-08-01 date_col='Date', value_col='Sales', id_col='Customer Number')

Thanks in advance!

winedarksea commented 1 year ago

Understanding the expected behavior is a bit complicated. When AutoTS.fit() runs, it selects but doesn't train the final model on the full data (in cross validation there are trained models, but not on full data, only on cross validation samples). When AutoTS.predict runs, it will train/fit the final model on the full dataset. Now some models will be different depending on forecast length, for example some regression models which output the full forecast length at once, will likely have different outputs for different forecast lengths inputs, although many models should be the same when run on different forecast lengths. Overall, it depends on what models you are using. If you need the same results across forecast lengths try limiting the model list.

On a related note, there a selection of update_fit models which allow .fit_data followed by .predict to update on new data without rerunning training, which is faster and also useful for consistency.

sebros-sandvik commented 1 year ago

Thanks for getting back so fast, and for a clear answer. What you are suggesting seems fair, but I wonder if it is right for me.

Here's the rub: I need forecasts for 18 months ahead, but they should be optimized for 3 months ahead. Could I do the following:

1.) Train with AutoTS(forecast_length = 3); model = model.fit(all_data) 2.) new_data = model_forecast(all_data, model, forecast_length = 3) 3.) all_data = concat(all_data, new_data) 4.) new_data = model_forecast(all_data, model) 5.) repeat until I have 18 months ahead forecast (6 loops)

Thanks again!

Best,

Seb

winedarksea commented 1 year ago

I would just stick with the original plan of AutoTS(forecast_length=3) then .predict(forecast_length=18). If the minor variation concerns you (and for me it really doesn't because forecasts are highly uncertain by definition, some variation is to be expected even with similar models) then choose a model_list of models that won't change based off forecast_length. I can suggest some if you want.

sebros-sandvik commented 1 year ago

Thank you sir, that would be very helpful!

(minor variations in above examples, but major when considering all time series in the data). I understand this can be a convergence issue also.

A job well done on the package, thank you kindly for taking the time and answering!

winedarksea commented 1 year ago

Thanks, if you continue to see "major"variations feel free to post more. It's possible there is a bug with the specific model being used, in that case (largely a large chunk of JSON for an ensemble).