kubeflow / community

Information about the Kubeflow community including proposals and governance information.
Apache License 2.0
160 stars 220 forks source link

Kubeflow component integration with ML Metadata #783

Open richardsliu opened 2 years ago

richardsliu commented 2 years ago

/kind feature

Why you need this feature: Kubeflow currently doesn't have a unified metadata/artifact management story beyond what's supported in KFP. For example, the concept of a "ML experiment" exists in training and hyperparameter tuning, but there is no way to track it across separate Kubeflow components. Having unified metadata tracking allows users to aggregate things like:

Originally Kubeflow covered this through the Metadata project but it has since been archived. There were some additional discussions around this, found in issue kubeflow/kubeflow#4955.

It would be great to revisit this problem and see if we can propose a unified interface for metadata and artifact storage, possibly by using ML metadata.

Describe the solution you'd like:

One problem with the original Kubeflow metadata project is that it comes with its own storage backend using MySQL, which makes it heavy-weight. We do not need to re-implement the storage backend since MLMD already solves that problem. Instead, we can make MLMD an optional installation, and write to it directly. This is what KFP is currently doing, see this link for the code.

If we can define a unified data model and interface, it should be possible to build a light-weight library on top of ML metadata. It can be an optional import for training jobs and hyperparameter tuning jobs.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

jbottum commented 2 years ago

/priority p1 /kind feature

zijianjoy commented 2 years ago

Thank you Richard for your proposal.

I think it will be beneficial if more Kubeflow components want to adopt MLMD. The questions I have are:

  1. Are we looking for a mechanism to only group objects across different Kubeflow components? Do we provide a mechanism for Kubeflow components to consume MLMD?
  2. How to guarantee the MLMD version consistency across Kubeflow components?

Note: MLMD has become the hard dependency of KFP in KFPv2. We are not only writing to MLMD, we are also reading MLMD for status update.

Note: Can external addon also use MLMD? For example: Can KServe also use MLMD? If so, how to design a client which can be adopted by Kubeflow components and addon?

Note: Once we have a proper proposal, you can make use of https://github.com/kubeflow/community/tree/master/proposals by creating a PR to this folder.

johnugeorge commented 2 years ago

This will be a great value add

juliusvonkohout commented 2 years ago

i think it is very dangerous because MLMD is not yet separated per namespace https://github.com/kubeflow/pipelines/issues/4790. It will lower the security standards even more if more components break down the namespace isolation.

ca-scribner commented 2 years ago

In general I think this is a great proposal. This to me has been one of the bigger gaps in Kubeflow ever since the previous attempt was archived. There's details to be worked out as @zijianjoy and @juliusvonkohout mention, but they're not impossible.

What other requirements do people envision needed for this? I agree with @juliusvonkohout that whatever we do it should at least have an option for user isolation. Whether it is completely isolated or we maintain two stores (one shared and one namespaced) is debatable. I believe @zijianjoy had some good comments about that and maintaining backward compatibility.

rustam-ashurov-mcx commented 2 years ago

The ability to track experiments' metadata in a centralized place dedicated to such aims would be great 👍 Atm I'm not sure what to use for such audit/governance/tracking activities without the help of external tools. The same time I don't even want to try a mix of KFL and MLFlow since it looks to me like a over-engineering in case there could be built-in functionality for it

frittentheke commented 5 months ago

One reads about support for MLFlow here and there across the KubeFlow components and SDKs. MLflow is also mentioned in the GSOC 2024 list of ideas: https://www.kubeflow.org/events/gsoc-2024/#project-10-enhancing-kf-model-registry-python-client-for-seamless-ml-imports-from-alternative-registries

Is the current state of the integration of MLflow for metadata summarized somewhere? In short I'd like to understand if and how Kubeflow can leverage the capabilities and data of an existing MLflow installation.

juliusvonkohout commented 5 months ago

@frittentheke so far most users just manage MLflow themselves next to Kubeflow. Integration is possible, but manual and you have to get the hard multi-tenancy right.

frittentheke commented 5 months ago

@frittentheke so far most users just manage MLflow themselves next to Kubeflow. Integration is possible, but manual and you have to get the hard multi-tenancy right.

Thanks for your response @juliusvonkohout !

Full integration and automation is nice, but also makes things less lightweight or even clunky. I envision (read: have) an environment with an existing MLflow installation containing experiment tracking data already. So I am asking about the integrations and wondering if this can work "nicely" together with Kubeflow. Or will this just duplicate features Kubeflow also covers itself and then feel alien?

andreyvelich commented 1 month ago

Let's continue this discussion in community repo. cc @kubeflow/wg-data-leads /transfer community

tarilabs commented 1 month ago

thanks @andreyvelich for bringing this back into attention

One problem with the original Kubeflow metadata project is that it comes with its own storage backend using MySQL, which makes it heavy-weight

I'm not sure I understood this from the original posting 🤔 isn't MLMD backed by MySQL(/MariaDB/PostgreSQL) too?