zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
4.04k stars 436 forks source link

[FEATURE] Is it possible to a generic flexible solution for MLOps pipelines independent of the use case, type of model etc? #68

Closed aimlnerd closed 3 years ago

aimlnerd commented 3 years ago

Is your feature request related to a problem? Please describe. IMHO, Having a generic flexible framework for ML pipelines independent of the use case, type of model etc. would be the best case. So the its up to the creativity and use of the DS to implement the pipeline. Also input and output for each steps in the pipeline could be generic and definable. Is this possible in ZenML?

Describe the solution you'd like

  1. How flexible is ZenML? In the example in the blog CSVDataSource() is used but in many production scenario data for training is pulled from data stored in a database like feature store, mongodb, folder or what ever. How can data for training be pulled from generic sources ?
  2. Can we have custom components of the pipeline instead of fixed components depending on use case image processing, NLP, regression on structured data, arima time series forecasting?
  3. Can the pipeline components be visualized in ZenML?
htahir1 commented 3 years ago

Hi @deepakiim, thanks for the request. In general, ZenML is still early in development so we're working towards enabling a lot of the needs you expressed in your question. But I'll give it a go:

  1. ZenML is intended to fully agnostic of datasource -> We have currently Postgres, BigQuery, CSV, and Images but you can also create a custom datasource -> I understand documentation is lacking for that and we are working on fixing it as we speak
  2. Yes, we're building a dashboard for the cloud offering -> this might take longer to launch but it will have visualization of the pipelines. Currently, there is no graphical representation for the open-source version.
  3. I left this for last as it is a more involved answer: We intend for our higher level pipelines to account for most of the use-cases you mentioned -> So e.g. you can take a TrainingPipeline and use it for image processing and even arima. But we understand that for some users creating completely arbitrary connections is important. Currently, we use TFX under-the-hood to create the actual pipelines and their connections: You can easily create a new pipeline based on the BasePipeline class and inject your own TFX pipeline. However, this will change as well in the future with a more native way of creating components.

We're a small team for now so we're working as hard as possible to make the framework API's more mature. Expect to see a whole bunch of progress on these questions in the upcoming months - we're fully commited! Also, if you see an angle to make your own contribution then it would also be of immense help -> We can help with that directly if you DM me on our Slack . Looking forward to hearing from you!