Fully automate training and forecasting pipeline

microsoft / solution-accelerator-many-models

MIT License

193 stars 85 forks source link

Fully automate training and forecasting pipeline #114

Closed jingwora closed 3 years ago

jingwora commented 3 years ago

Refer to 03_AutoML_Forecasting_Pipeline.ipynb, We plan to automate Modeling and forecasting pipeline, but we need to manually input training_pipeline_run_id as in below script. Could we add any code to get training_pipeline_run_id of the latest run?

from scripts.helper import get_automl_environment training_pipeline_run_id ="" training_experiment_name = "" forecast_env = get_automl_environment(workspace=ws, training_pipeline_run_id=training_pipeline_run_id, training_experiment_name=training_experiment_name)

sagarsumant commented 3 years ago

@jingwora - You need to get forecast_env in this fashion only if you are running training separately and forecasting separately (possibly multiple times in future). If you are planning to run everything in the same pipeline then you can simply use same environment as train_env, simply do forecast_env = train_env.

I would like to understand your scenario in more details so I can recommend the most suitable solution for you.

jingwora commented 3 years ago

@sagarsumant
many thanks for your explaination.

My senario is to run end to end quarterly and forecasting monthly. quarterly tasks: prepare data -> fit models -> forecasting -> insert to DB -> Report monthly tasks: prepare data -> forecasting -> insert to DB -> Report

I plan to integrate this notedbooks processes into data factory since data pipeline is in data factory. From my understanding, every quater, when model is updated, training_pipeline_run_id will be changed. I need to change the training_pipeline_run_id mannually in 03_AutoML_Forecasting_Pipeline.ipynb. training_pipeline_run_id =""

By the way, I found one way around to get training_pipeline_run_id from parallel_run_step.txt file. This works well right now. Many thanks!

sagarsumant commented 3 years ago

I am glad you found the solution that works for you needs. Thanks.

pbartos commented 3 years ago

I tried this approach but I am receiving the error: Step [many-models-train]: script not found at: c:\**\automl_project_dir\many_models_train_driver.py. Make sure to specify an appropriate source_directory on the Step or default_source_directory on the Pipeline.

Pipeline run:

pipeline = Pipeline(workspace = ws, steps=[train_steps, inference_steps]) run = experiment.submit(pipeline)

Problem is that AutoMLPipelineBuilder is using the same project folder for train and inference.

Is that an issue or just a misunderstanding?