Demand Forecasting: Zero-Shot Time-Series Inference

adampingel commented 2 months ago

Zero-shot inference based on @wgifford's TTM Energy Demand Forecasting notebook.

NOTE: this recipe should go into the Granite Timeseries Cookbook.

wgifford commented 2 months ago

I would suggest zero-shot inference, combined with plotting the results and evaluating the performance.

fayvor commented 2 months ago

Picking this up. I'll start with a recipe version of this Getting Started notebook.

wgifford commented 2 months ago

@fayvor Can we start with: https://github.com/ibm-granite/granite-tsfm/blob/cookbook-dev/notebooks/recipes/energy_demand_forecasting/demand_forecast_zeroshot_recipe.ipynb

This use the preprocessor and forecasting pipeline, which I believe are bit easier to consume than the outputs of trainer.predict()

wgifford commented 2 months ago

@fayvor Barebones, minimal notebook is here: https://github.com/ibm-granite/granite-tsfm/blob/cookbook-dev/notebooks/recipes/energy_demand_forecasting/demand_forecast_zeroshot_recipe_minimal.ipynb

fayvor commented 2 months ago

That looks good, @wgifford. Integrating into my version now.

fayvor commented 2 months ago

Hi @wgifford, the forecasting pipeline section is failing with this error. Any ideas?

pipeline = TimeSeriesForecastingPipeline(
    zeroshot_model, timestamp_column=timestamp_column, target_columns=target_columns, explode_forecasts=True, freq="h"
)
zeroshot_forecast = pipeline(data)
zeroshot_forecast.head()
---
{
    "name": "TypeError",
    "message": "You have to supply one of 'by' and 'level'",
    "stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/nc/jrql4k0n2j73h7xktzxdr4pr0000gn/T/ipykernel_6860/1549610639.py in ?()
      1 pipeline = TimeSeriesForecastingPipeline(
      2     zeroshot_model, timestamp_column=timestamp_column, target_columns=target_columns, explode_forecasts=True, freq=\"h\"
      3 )
----> 4 zeroshot_forecast = pipeline(data)
      5 zeroshot_forecast.head()

~/Dev/granite-tsfm/tsfm_public/toolkit/time_series_forecasting_pipeline.py in ?(self, time_series, **kwargs)
    320             all the values over the prediction horizon.
    321 
    322         \"\"\"
    323 
--> 324         return super().__call__(time_series, **kwargs)

~/Dev/granite-tsfm/.venv/lib/python3.12/site-packages/transformers/pipelines/base.py in ?(self, inputs, num_workers, batch_size, *args, **kwargs)
   1253                     )
   1254                 )
   1255             )
   1256         else:
-> 1257             return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)

~/Dev/granite-tsfm/tsfm_public/toolkit/time_series_forecasting_pipeline.py in ?(self, inputs, preprocess_params, forward_params, postprocess_params)
     50         Returns:
     51             _type_: _description_
     52         \"\"\"
     53         # our preprocess returns a dataset
---> 54         dataset = self.preprocess(inputs, **preprocess_params)
     55 
     56         batch_size = forward_params[\"batch_size\"]
     57         num_workers = forward_params[\"num_workers\"]

~/Dev/granite-tsfm/tsfm_public/toolkit/time_series_forecasting_pipeline.py in ?(self, time_series, **kwargs)
    372 
    373             time_series = pd.concat((time_series, future_time_series), axis=0)
    374         else:
    375             # no additional exogenous data provided, extend with empty periods
--> 376             time_series = extend_time_series(
    377                 time_series=time_series,
    378                 timestamp_column=timestamp_column,
    379                 grouping_columns=id_columns,

~/Dev/granite-tsfm/tsfm_public/toolkit/time_series_preprocessor.py in ?(time_series, timestamp_column, grouping_columns, freq, periods)
   1004 
   1005     if grouping_columns == []:
   1006         new_time_series = augment_one_series(time_series)
   1007     else:
-> 1008         new_time_series = time_series.groupby(grouping_columns).apply(augment_one_series, include_groups=False)
   1009         idx_names = list(new_time_series.index.names)
   1010         idx_names[-1] = \"__delete\"
   1011         new_time_series = new_time_series.reset_index(names=idx_names)

~/Dev/granite-tsfm/.venv/lib/python3.12/site-packages/pandas/core/frame.py in ?(self, by, axis, level, as_index, sort, group_keys, observed, dropna)
   9177 
   9178         from pandas.core.groupby.generic import DataFrameGroupBy
   9179 
   9180         if level is None and by is None:
-> 9181             raise TypeError(\"You have to supply one of 'by' and 'level'\")
   9182 
   9183         return DataFrameGroupBy(
   9184             obj=self,

TypeError: You have to supply one of 'by' and 'level'"
}

wgifford commented 2 months ago

can you try explicitly passing id_columns=[] in TimeSeriesForecastingPipeline()?

fayvor commented 2 months ago

That seems to work, thx.

wgifford commented 2 months ago

In prior versions, the setting of default values in the pipeline was problematic. I will confirm that they are fixed in the latest main.

fayvor commented 2 months ago

PR up here.

@wgifford I think there are two (three?) remaining steps. If the PR looks good otherwise, we could merge and then do these:

Move the plotting function into granite_tsfm.
Push cookbook-dev to main and/or pin the version.
(Optional) Package tsfm_public and publish to PyPi.

wgifford commented 2 months ago

Can we look at if the plotting can be integrated with the existing plotting function?

I will confirm that for this notebook we can pin to the current version of main.

We don't currently have PyPi set up for granite-tsfm -- is there some general guidance (cc @adampingel)

fayvor commented 2 months ago

Can we look at if the plotting can be integrated with the existing plotting function?

Yes, I'll work on a PR against plot_predictions on the cookbook-dev branch: https://github.com/ibm-granite/granite-tsfm/blob/cookbook-dev/tsfm_public/toolkit/visualization.py#L207

wgifford commented 2 months ago

Can we look at if the plotting can be integrated with the existing plotting function? Yes, I'll work on a PR against plot_predictions on the cookbook-dev branch: https://github.com/ibm-granite/granite-tsfm/blob/cookbook-dev/tsfm_public/toolkit/visualization.py#L207

Thanks! Please PR against main for this one. ~I believe there are 1 or 2 fixes that didn't make it to cookbook-dev yet.~ Actually, I am working some minor fixes -- one pertains to plot_predictions (num_plots) here: https://github.com/ibm-granite/granite-tsfm/pull/124 the rest are around datetime handling with timezones

fayvor commented 2 months ago

Please PR against main for this one.

Ok. Does that mean I can now point to main from the recipe as well?

wgifford commented 2 months ago

Yes, please try it.

fayvor commented 2 months ago

@wgifford can you give me access to push a branch to granite-tsfm? If not, I'll do a fork and PR.

wgifford commented 2 months ago

Invite sent

fayvor commented 2 months ago

The new PR is here.

wgifford commented 2 months ago

@adampingel Can we close?

ibm-granite-community / pm

Demand Forecasting: Zero-Shot Time-Series Inference #86