winedarksea / AutoTS

Automated Time Series Forecasting
MIT License
1.1k stars 100 forks source link

Can't see 'Contour' metric result #192

Closed ericleonardo closed 1 year ago

ericleonardo commented 1 year ago

Hi... I'm interested only in accurately predicting Up/Down movements, no matter magnitude. For this, I informed metric_weighting parameter with 'contour_weighting': 1 and all other 0.

But the Contour metric is not displayed during the training and not even after. When I do print(model) after training, it returns only SMAPE, MAE and SPL metrics.

I'm interested optimizing only Contour (up/down) for predicting if stock price will Rise or Fall. Please, how can we display Contour metric result? I'm newbie with AutoTS. Thanks

winedarksea commented 1 year ago

Contour is a shape fitting metric, not a directional metric. You want oda (origin directional accuracy, ie the number of timesteps that were correctly predicted up or down versus most recent data, better for reporting) and dwae (directional weighted absolute error, which is better for optimization, and which I designed for just this case of stock forecasting).

It is there, just not in the printed results (that is just a quick summary). Here's what you want

model_results = model.results()
validation_results = model.results("validation")

these will be pandas dataframes you can view as desired

ericleonardo commented 1 year ago

Thank you! now I can see contour and oda fields. Some models got 0.66, 1.00 or 0.00. I'm training only RollingRegression, with ensemble and transformer_list as 'all'. Can I consider these results as out of sample directional accuracy? or validation is biased during training?

I'm using 5 minutes frequency Close price. Objective is predict 1 step ahead, Up or Down. (Next 5 min bar) My full dataset is 17000 rows of historical data. Inferred frequency is: 5T . Is this right? Thank you for help. AutoTS is new to me. I'm used to sklearn approach.

metric_weighting = {
    'smape_weighting': 0,
    'mae_weighting': 0,
    'rmse_weighting': 0,
    'made_weighting': 0,
    'mage_weighting': 0,
    'mle_weighting': 0,
    'imle_weighting': 0,
    'spl_weighting': 0,
    'containment_weighting': 0,
    'contour_weighting': 1,
    'runtime_weighting': 0,
}

model_list = [
    'RollingRegression'
]

model = AutoTS(
    forecast_length=1,
    frequency='infer',
    ensemble='all',
    model_list=model_list,
    transformer_list='all',
    max_generations=5,
    num_validations=2,
    metric_weighting=metric_weighting,
    n_jobs='auto',
)

model = model.fit(full_dataset, date_col='Date', value_col='Close')
winedarksea commented 1 year ago

MultivariateRegression is mostly an improved version of rolling regression so might be worth using that. Plenty of other models will likely work too, no reason to limit to just one model type. You are going to want more than 5 max generations. I often run at least 100. You can set generation_timeout to limit the search time (in minutes) if it is too slow. I suggest also adding more metrics to your metric weighting. You can set your favorite to 10 and then others to 1, or some such, a mixture of metrics helps prevent overfitting.

Those are out of sample accuracy, yes, from your validations (adjustable by validation method and num_validations).

5T = 5 minutes, you can see the list here https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects

ericleonardo commented 1 year ago

Very interesting... I will try the multivariate model as well. My objective is trade on 5 minutes frequency predicting 1 step ahead. Slow train between is not possible. I will must train model only once each day, and use it during next day. 12h working hours has 144 bars of 5 minutes. Please, how can I validate on every group of 144 samples without training again? This way I will simulate performance when working for 12h each day without train.

Or do you recommend select only best model previously? so training only 1 model between new predictions during work day will be faster. Thank you!

winedarksea commented 1 year ago

This is an interesting use case, I've not had to do high frequency forecasts before, with my preferred method being like that seen in the production_example.py, for daily or weekly forecasts. I will probably need to make some changes to the package to make this work as fast as possible. I've been working on something similar but it's not in the released version, but is in the dev branch here on GitHub. It looks like this:

# initial training or training update
model = AutoTS()
model.import_template(template)
model.fit(df)

# for prediction on streaming data
model.fit_data(latest_df)
model.predict()

does that syntax look reasonable to you?

ericleonardo commented 1 year ago

Please, is there a way to train RollingRegression without Keras models? Most time is spent training Epochs with Keras models, but very few of them are selected in final best models in my dataset. I would like to compare results of RollingRegression with and without Keras. Maybe they can be ignored to train faster. Thanks!

winedarksea commented 1 year ago

Options:

  1. Use AutoTS in an environment where Tensorflow is not installed
  2. Use models_mode='gradient_boosting' in AutoTS class (or to method of get_new_params if using the lower level class) for only gradient boosting models (ie LightGBM, if installed). There's also a 'neuralnets' method for only using neuralnets. If you are really interested in neural nets I suggest trying the GluonTS and PytorchForecasting models instead, as they are more focused on that. Will require additional package installs
  3. Buy a really expensive GPU and make sure Tensorflow is using it. That will make it faster.
ericleonardo commented 1 year ago

Ok... Thank you very much for the help! I'm getting average 0.68 oda accuracy with 30 validations (backwards). This seems very good (I think too good to be true). I will continue testing to understand if this result is consistent to use on real trading. AutoTS is the best model among many other I tried for this purpose of 1 step ahead direction prediction.

Thank you so much...

winedarksea commented 1 year ago

AutoTS 0.6.0 (due out in a few days) will enable prediction without any retraining for a few select models (a model list update_fit contains which are currently supported).

simplified example:

model = AutoTS(model_list='update_fit')
model.fit(df)
model.predict()
# for new data without retraining
model.fit_data(df)
model.predict()
# to force retrain of best model (but not full model search)
model.model = None
model.fit_data(df)
model.predict()