Closed merelcht closed 6 months ago
A way of doing this could be through creating the stack on MLOPs stack.
I think we need to discuss what is the scope of this, often times when people talk about end-to-end ML it's vague.
According to the MLOps stack, it covers these components:
- Experiment Tracking (development)
- Data Versioning (development & deployment)
- Code Versioning
- Pipeline orchestration
- Runtime engine
- Artifact tracking
- Model Serving
- Model Monitoring
- Data monitoring/validation (Great Expectations or something else) - this isn't covered by the MLOps stack
When I think about the stack, I am thinking of something with minimal scope. Obviously, you still need Git and some monitoring service, but it's not included in the MEAN/LAMP stack. The same goes with the ML Stack, what's the real minimal stack we needed?
I think the more important missing parts might be serving
, artifact store
. Something like Great Expectations for data validation would be a PLUS but I don't think this is strictly necessary.
Some insights about Airflow's dominance https://www.linkedin.com/posts/hugo-lu-confirmed_dataorchestration-dataengineering-dataengineers-activity-7094595004576227328-slSU
Pandera + Airflow + Kedro = PAK? 😄
Kedro needs to be in the middle 😀
Another idea for a Kedro stack: https://linen-slack.kedro.org/t/16014653/hello-very-much-new-to-the-ml-world-i-m-trying-to-setup-a-fr#6546163c-e141-4c07-ae28-71bf31dd25b7
- kedro for creating training pipelines and overall project structure
- mlflow for experiment tracking and model registry
- dvc for dataset versioning
- TensorFlow for machine learning framework
- RayTune for hyperparameter tuning
For reference, a Spark-centric, fully open source, Kedro-based stack using mymlops.com
Closing as this isn't a priority for now.
Description
A go-to Kedro Stack
Implementation idea
Questions