ibm-granite / granite-tsfm

Foundation Models for Time Series
Apache License 2.0
282 stars 151 forks source link

Including future events information for prediction #71

Open nikhilnkhetan opened 2 weeks ago

nikhilnkhetan commented 2 weeks ago

Is there a way to include data for future events into the model?

For example, if I am using the model to make stock price predictions one month down the line, can I input future known data such as Earnings announcement date, dividend announcement dates for a company into the model?

HarrisonOates commented 2 weeks ago

Using TimeSeriesForecastingPipeline (like in #53), there is field for future_time_series (See here). However, I'm not if I'm using it right, as I'm getting duplicate dates when I specify future_time_series: image

For my use case, all exogenous variables are known in the past and in the future, and I'm predicting a single univariate time series. Exogenous variables are already declared as observable_columns in TimeSeriesPreprocessor. Does specifying future_time_series do anything in this instance?

Code:

from tsfm_public.toolkit.time_series_forecasting_pipeline import TimeSeriesForecastingPipeline

forecast_pipeline = TimeSeriesForecastingPipeline(
    model=zeroshot_model,
    timestamp_column=timestamp_column,
    id_columns=id_columns,
    target_columns=target_columns,
    observable_columns=control_columns,
    freq="30m",
    feature_extractor=tsp,
    #explode_forecasts=False,
    preprocessor=tsp,
    prediction_length=48,
    inverse_scale_outputs=True,
    future_time_series=t
)

forecasts = forecast_pipeline(tsp.preprocess(t))
forecasts.head()
wgifford commented 2 weeks ago

@HarrisonOates The future_time_series argument is used to specify the observed values of exogenous features in the future, i.e., continuing past the end of the time series provided when you call forecast_pipeline. The underlying class from HF transformers.pipelines.base.Pipeline supports passing this argument both during construction and during the call method.

The future values of the exogenous will only be useful toward the end of the input time series (t in your example), since this is where windows will have values extending into the future.

I am puzzled by the duplication -- it would be helpful if you could share some example that reproduces it.