nsenno-dbr / many-model-forecasting

Bootstrap your large scale forecasting solution on Databricks with Many Models Forecasting (MMF) Project.
Other
0 stars 0 forks source link

Many Model Forecasting by Databricks

Introduction

Bootstrap your large-scale forecasting solutions on Databricks with the Many Models Forecasting (MMF) Solution Accelerator.

MMF accelerates the development of sales and demand forecasting solutions on Databricks, including critical phases of data preparation, training, backtesting, cross-validation, scoring, and deployment. Adopting a configuration-over-code approach, MMF minimizes the need for extensive coding. But with its extensible architecture, MMF allows technically proficient users to incorporate new models and algorithms. We recommend users to read through the source code, and modify it to their specific requirements.

MMF integrates a variety of well-established and cutting-edge algorithms, including local statistical models, global deep learning models, and foundation time series models. MMF enables parallel modeling of hundreds or thousands of time series leveraging Spark's distributed compute. Users can apply multiple models at once and select the best performing one for each time series based on their custom metrics.

Get started now!

Getting started

To run this solution on a public M4 dataset, clone this MMF repo into your Databricks Repos.

Local Models

Local models are used to model individual time series. They could be advantageous over other types of model for their capabilities to tailor fit to individual series, offer greater interpretability, and require lower data requirements. We support models from statsforecast, r fable and sktime. Covariates (i.e. exogenous regressors) are currently only supported for some models from statsforecast.

To get started, attach the examples/local_univariate_daily.py notebook to a cluster running DBR 14.3 LTS for ML or later versions. The cluster can be either a single-node or multi-node CPU cluster. Make sure to set the following Spark configurations on the cluster before you start using MMF: spark.sql.execution.arrow.enabled true and spark.sql.adaptive.enabled false (more detailed explanation to follow).

In this notebook, we will apply 20+ models to 100 time series. You can specify the models to use in a list:

active_models = [
    "StatsForecastBaselineWindowAverage",
    "StatsForecastBaselineSeasonalWindowAverage",
    "StatsForecastBaselineNaive",
    "StatsForecastBaselineSeasonalNaive",
    "StatsForecastAutoArima",
    "StatsForecastAutoETS",
    "StatsForecastAutoCES",
    "StatsForecastAutoTheta",
    "StatsForecastTSB",
    "StatsForecastADIDA",
    "StatsForecastIMAPA",
    "StatsForecastCrostonClassic",
    "StatsForecastCrostonOptimized",
    "StatsForecastCrostonSBA",
    "RFableArima",
    "RFableETS",
    "RFableNNETAR",
    "RFableEnsemble",
    "RDynamicHarmonicRegression",
    "SKTimeTBats",
    "SKTimeLgbmDsDt",
]

A comprehensive list of local models currently supported by MMF is available in the mmf_sa/models/models_conf.yaml.

Now, run the forecasting using run_forecast function with the active_models list specified above:


catalog = "your_catalog_name"
db = "your_db_name"

run_forecast(
    spark=spark,
    train_data=f"{catalog}.{db}.m4_daily_train",
    scoring_data=f"{catalog}.{db}.m4_daily_train",
    scoring_output=f"{catalog}.{db}.daily_scoring_output",
    evaluation_output=f"{catalog}.{db}.daily_evaluation_output",
    group_id="unique_id",
    date_col="ds",
    target="y",
    freq="D",
    prediction_length=10,
    backtest_months=1,
    stride=10,
    metric="smape",
    train_predict_ratio=2,
    data_quality_check=True,
    resample=False,
    active_models=active_models,
    experiment_path="/Shared/mmf_experiment",
    use_case_name="m4_daily",
)

Parameters description:

To modify the model hyperparameters, change the values in mmf_sa/models/models_conf.yaml or overwrite these values in mmf_sa/forecasting_conf.yaml.

MMF is fully integrated with MLflow, so once the training kicks off, the experiments will be visible in the MLflow Tracking UI with the corresponding metrics and parameters (note that we do not log all local models in MLFlow, but we store the binaries in the tables evaluation_output and scoring_output). The metric you see in the MLflow Tracking UI is a simple mean over backtesting trials over all time series.

We encourage you to read through examples/local_univariate_daily.py notebook to better understand how local models can be applied to your time series using MMF. Other example notebooks for monthly forecasting and forecasting with exogenous regressors can be found in examples/local_univariate_monthly.py and examples/local_univariate_external_regressors_daily.py.

Global Models

Global models leverage patterns across multiple time series, enabling shared learning and improved predictions for each series. You would typically train one big model for many or all time series. They can often deliver better performance and robustness for forecasting large and similar datasets. We support deep learning based models from neuralforecast. Covariates (i.e. exogenous regressors) and hyperparameter tuning are both supported for some models.

To get started, attach the examples/global_daily.py notebook to a cluster running DBR 14.3LTS for ML or later version. We recommend using a single-node cluster with multiple GPU instances such as g4dn.12xlarge [T4] on AWS or Standard_NC64as_T4_v3 on Azure. Multi-node setting is currently not supported.

You can choose the models to train and put them in a list:

active_models = [
    "NeuralForecastRNN",
    "NeuralForecastLSTM",
    "NeuralForecastNBEATSx",
    "NeuralForecastNHITS",
    "NeuralForecastAutoRNN",
    "NeuralForecastAutoLSTM",
    "NeuralForecastAutoNBEATSx",
    "NeuralForecastAutoNHITS",
    "NeuralForecastAutoTiDE",
    "NeuralForecastAutoPatchTST",
]

The models prefixed with "Auto" perform hyperparameter optimization within a specified range (see below for more detail). A comprehensive list of models currently supported by MMF is available in the models_conf.yaml.

Now, with the following command, we run the examples/run_daily.py notebook that will in turn call run_forecast function and loop through the active_models list .

for model in active_models:
  dbutils.notebook.run(
    "run_daily",
    timeout_seconds=0, 
    arguments={"catalog": catalog, "db": db, "model": model, "run_id": run_id})

Inside the examples/run_daily.py, we have the run_forecast function specified as:

run_forecast(
    spark=spark,
    train_data=f"{catalog}.{db}.m4_daily_train",
    scoring_data=f"{catalog}.{db}.m4_daily_train",
    scoring_output=f"{catalog}.{db}.daily_scoring_output",
    evaluation_output=f"{catalog}.{db}.daily_evaluation_output",
    model_output=f"{catalog}.{db}",
    group_id="unique_id",
    date_col="ds",
    target="y",
    freq="D",
    prediction_length=10,
    backtest_months=1,
    stride=10,
    metric="smape",
    train_predict_ratio=2,
    data_quality_check=True,
    resample=False,
    active_models=[model],
    experiment_path="/Shared/mmf_experiment",
    use_case_name="m4_daily",
    run_id=run_id,
    accelerator="gpu",
)

Parameters description:

The parameters are all the same except:

To modify the model hyperparameters or reset the range of the hyperparameter search, change the values in mmf_sa/models/models_conf.yaml or overwrite these values in mmf_sa/forecasting_conf.yaml.

MMF is fully integrated with MLflow and so once the training kicks off, the experiments will be visible in the MLflow Tracking UI with the corresponding metrics and parameters. Once the training is complete the models will be logged to MLFlow and registered to Unity Catalog.

We encourage you to read through examples/global_daily.py notebook to better understand how global models can be applied to your time series using MMF. Other example notebooks for monthly forecasting and forecasting with exogenous regressors can be found in examples/global_monthly.py and examples/global_external_regressors_daily.py respectively.

Foundation Models

Foundation time series models are transformer based models pretrained on millions or billions of time points. These models can perform analysis (i.e. forecasting, anomaly detection, classification) on a previously unseen time series without training or tuning. We support open source models from multiple sources: chronos, moirai, and moment. Covariates (i.e. exogenous regressors) and fine-tuning are currently not yet supported. This is a rapidly changing field, and we are working on updating the supported models and new features as the field evolves.

To get started, attach the examples/foundation_daily.py notebook to a cluster running DBR 14.3 LTS for ML or later versions. We recommend using a single-node cluster with multiple GPU instances such as g4dn.12xlarge [T4] on AWS or Standard_NC64as_T4_v3 on Azure. Multi-node setup is currently not supported.

You can choose the models you want to evaluate and forecast by specifying them in a list:

active_models = [
    "ChronosT5Tiny",
    "ChronosT5Mini",
    "ChronosT5Small",
    "ChronosT5Base",
    "ChronosT5Large",
    "MoiraiSmall",
    "MoiraiBase",
    "MoiraiLarge",
    "Moment1Large",
]

A comprehensive list of models currently supported by MMF is available in the models_conf.yaml.

Now, with the following command, we run examples/run_daily.py notebook that will in turn run run_forecast function. We loop through the active_models list for the same reason mentioned above (see the global model section).

for model in active_models:
  dbutils.notebook.run(
    "run_daily",
    timeout_seconds=0, 
    arguments={"catalog": catalog, "db": db, "model": model, "run_id": run_id})

Inside the examples/run_daily.py, we have the same run_forecast function as above.

To modify the model hyperparameters, change the values in mmf_sa/models/models_conf.yaml or overwrite these values in mmf_sa/forecasting_conf.yaml.

MMF is fully integrated with MLflow and so once the training kicks off, the experiments will be visible in the MLflow Tracking UI with the corresponding metrics and parameters. During the evaluation, the models are logged and registered to Unity Catalog.

We encourage you to read through examples/foundation_daily.py notebook to better understand how foundation models can be applied to your time series using MMF. An example notebook for monthly forecasting can be found in examples/foundation_monthly.py.

Using Time Series Foundation Models on Databricks

If you want to try out time series foundation models on Databricks without MMF, you can find example notebooks in examples/foundation-model-examples. These notebooks will show you how you can load, distribute the inference, fine-tune, register, deploy a model and generate online forecasts on it. We have notebooks for TimeGPT, Chronos, Moirai, Moment, and TimesFM.

Vector Lab - Many Model Forecasting

IMAGE ALT TEXT HERE

Project support

Please note the code in this project is provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects. The source in this project is provided subject to the Databricks License. All included or referenced third party libraries are subject to the licenses set forth below.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.