winedarksea / AutoTS

Automated Time Series Forecasting
MIT License
1.11k stars 100 forks source link

Results are not reproducible #59

Closed avijit1996iiti closed 3 years ago

avijit1996iiti commented 3 years ago

I have tried to use this library for my work but the results are not reproducible. Can you please tell why is it happening?

winedarksea commented 3 years ago

@avijit1996iiti you are going to need to be a bit more specific to get the best help.

I think the result you are seeing is that when you run AutoTS with many generations, it incorporates randomness into its model search process, and so each run will be slightly different. What you need to do is run AutoTS, export a template, then if you want the same results, run AutoTS again with 0 generations and importing the model template. You can checkout the extended_tutorial for more on that.

avijit1996iiti commented 3 years ago

read data

biocon=pd.read_csv(r'BIOCON.csv')

build model

model=AutoTS(forecast_length=15,frequency='infer',ensemble='simple', drop_data_older_than_periods=200,verbose=0)

fit the model

model=model.fit(biocon,date_col='Date',value_col='Close',id_col=None)

make the forecast

prediction=model.predict() forecast=prediction.forecast

validate

validation=model.results('validation')

I am using the above code. It would be great if you can specify what changes should I do to get reproducible results.

avijit1996iiti commented 3 years ago

Result of first run

image

Result of second run

image

Result of third run

image

avijit1996iiti commented 3 years ago

@winedarksea correct me if I am wrong I am giving code snippets below with corresponding output

building and training the model

model=AutoTS(forecast_length=15,frequency='infer',ensemble='simple', drop_data_older_than_periods=200,verbose=0)

model=model.fit(biocon,date_col='Date',value_col='Close',id_col=None)

forecasting

prediction=model.predict() forecast=prediction.forecast forecast.plot()

output of forecast

image

Exporting a template

example_filename = "example_export.csv" # .csv/.json model.export_template(example_filename, models='best', n=15, max_per_model_class=3)

on new training

model = AutoTS(forecast_length=15, frequency='infer', max_generations=0, num_validations=0, verbose=0) model = model.import_template(example_filename, method='only') # method='add on'

to check the consistency I am running it for 10 times

forecast_list=[] for i in range(10): model.fit(biocon,date_col='Date',value_col='Close',id_col=None) prediction=model.predict() forecast=prediction.forecast forecast_list.append(forecast)

plot the results of these runs

image

Conclusion

getting same result in the above 10 runs but it is not same with the result of the exported model

please let me know how to fix these

winedarksea commented 3 years ago

@avijit1996iiti you should check the output of model.best_model for each graph. If that model is the same, and the results are significantly different then it is concerning (some models will vary slightly due to randomness on new runs, but should generally look the same).

For consistency, set ensemble=None. When exporting the template, to only get the same best model you need to .export_template(example_filename, models='best',n=1) note the n=1.

Remember it is actually a large collection of models and each run has found a different optimal model. By default, using only a few generations like here, you will see a wide variety in model chosen because it still hasn't gotten close to optimized. Trying setting max_generations much higher for your runs and you will see that even though it will still choose different models, it will generally have settled near a small optimum selection of model types and features.

winedarksea commented 3 years ago

I think I will up the default of max_generations to 20 for the next package release.