prio-data / views_pipeline

VIEWS forecasting pipeline for monthly prediction runs. Includes MLops and QA for all models/ensembles.
Other
3 stars 3 forks source link

Orchestration #46

Closed xiaolong0728 closed 2 months ago

xiaolong0728 commented 2 months ago

This is the first version of orchestration including managing ensemble models. Below are things worth mentioning:

For ensemble model

  1. The code only supports the old stepshifter. Double-check the targets as they should be the same for aggregation (log or non-log)
  2. Forecasting directly aggregates the output of each model (if exists). Evaluating uses the artifact of each model and then aggregates the predictions because df_output_dict changes the structure of original outputs and we don't store the original outputs for calibration and testing. In the future, we will have a script that produces the evaluation matrix with the same structure.

For orchestration

  1. The script executes every main.py file in every model and ensemble folder. Currently, it only allows you to do either single models or ensemble models, which is decided by whether the argument '--aggregation' is provided.
  2. Refer to orchestration/README.md to execute the codes.

For dataloader: There is a mismatch in the format between stepshifter model and hydranet. For stepshifter, the month and the level are indexes but for hydranet they are columns. I just commented out those codes but there will be errors for hydranet models if the branch is merged.