pyronear / pyro-risks

Data science for wildfire risk forecasting and monitoring
https://pyronear.github.io/pyro-risks
Apache License 2.0
25 stars 8 forks source link

[🚀 MLOPS 🚀 ] Training pipeline life-cycle management #51

Closed jsakv closed 3 years ago

jsakv commented 3 years ago

Motivation

Currently, our models are trained offline, and we deploy them directly as a prediction service. This manual process, as described below, comes with many challenges and doesn't help us with:

They are other challenges such as model testing and monitoring (but let's deal with them in another issue 😊 )

Why is it important for us:

🚀 Feature

Orchestrating and automating our training pipeline following the workflow described bellow

Design Proposal

Let' adopt a light well version of the pipeline describe above:

To ensure we keep things tidy, I think we adopt the following layers and the following package structure:

Setting this workflow would require to:

Alternatives

They are many tools available for orchestrating and managing machine learning pipelines. (Airflow, Dagster, Prefect, MlFlow) and we discussed some of them with @GHCamille. I just discovered DVC, and it is really lightwell compare to others. Moreover, it's free, and it doesn't require any additional infrastructure!

Additional Context

The diagrams are taken from this google article: MLOps: Continuous delivery and automation pipelines in machine learning

DVC and CML use cases

-> Sharing Data and Model File -> CML with DVC

As always, let me know what you think and see you in 🚀 production my friends 🚀