winedarksea / AutoTS

Automated Time Series Forecasting
MIT License
1.11k stars 100 forks source link

Allow use to specify target and features #74

Closed ghost closed 3 years ago

ghost commented 3 years ago

I cannot understand how autoLS will understand which is my target. For example, passing a wide dataframe how can I understand that that y is my target and x my features for predicting the target?

winedarksea commented 3 years ago

This being time series forecasting, your time series are really the X, and the future of the same series is the Y. You can pass additional X features with future_regressor.

If you want to manually control the X and the Y, you are probably looking for more general automated machine learning, some examples: https://automl.info/tpot/ https://automl.github.io/auto-sklearn/master/

ghost commented 3 years ago

Thank you. I am aware that it is time series forecasting. But is confusing the examples. In the examples you have here, there this a long df with a couple of series. Are you saying that autoTS in this example would try to predict the values for each series by treating them as separate ARMA processes?

winedarksea commented 3 years ago

Well, it really depends on which of the models it goes with. If the ARIMA model is used, then yep, can be ARMA, one per series. VAR and VARMAX is basically the multivariate version of ARIMA (all of these are from Statsmodels, check that package out if you aren't familiar with it). You can use model_list to force it to a certain model or models.

There's a list of the models at the end of the extended_tutorial. Feel free to ask questions about specific ones. Any with 'Regression' in the name are turning it into a variation of X and Y and feeding into traditional models like XGBoost and so on.

ghost commented 3 years ago

Thank you for the response. That's very clear.

One final question:

"....Any with 'Regression' in the name are turning it into a variation of X and Y and feeding into traditional models like XGBoost and so on...."

Here is a little that I am still confused about. In your example:

model = AutoTS(
    forecast_length=3,
    frequency='infer',
    prediction_interval=0.9,
    ensemble=None,
    model_list="superfast",
    transformer_list="fast",
    max_generations=5,
    num_validations=2,
    validation_method="backwards"
)

model = model.fit(
    df,
    date_col='datetime' if long else None,
    value_col='value' if long else None,
    id_col='series_id' if long else None,
)

prediction = model.predict()
print(model)

For the "Regression" type of models here is where I am still not sure what is considered X and Y.

winedarksea commented 3 years ago

Even with the Regression type models, the X and Y are still not directly specified. All are designed internally to accept the same df_wide_numeric style data.

You can see where it makes an X here: https://github.com/winedarksea/AutoTS/blob/master/autots/models/sklearn.py#L642

You can see an example of basically the same thing as what this is doing here in how it uses the X and Y: https://machinelearningmastery.com/time-series-forecasting-supervised-learning/