Closed jlewi closed 3 years ago
quick question, here "metadata deployment" means google/ml-metadata or KF metadata?
If google/ml-metadata, then it's already in KFP (google/mlmd provided a gRPC server). I think we don't have plan/resource to continue the KF metadata, right? I may lack of knowledge/info on the context.
I think https://github.com/kubeflow/metadata/issues/217#issuecomment-595625018 is the only planned future feature. Other than that, keeping current status and asking for community help is the best I can forsee.
If we don't plan to extend it, and it becomes redundant with google/ml-metadata, and that's whre KFP is focussed, best to bring it up in community meeting to decide on the future
If #217 is the only planned feature then what does this mean for creating a generic metadata story?
When people deploy Kubeflow pipelines do they get:
/cc @paveldournov
With current status, both 1. and 2. mentioned above are true.
If we have bandwidth, it's better metadata UI is keeping maintained separately, but I don't think that's the case now.
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
area/front-end | 0.63 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
I believe when you install the full KFP there are two different URLs for the UI for the metadata store
Are these both pointing at the same UI service or are they two different servers?
I suspect they are two different servers but I could be wrong.
They are two different servers. Some code is reused in kubeflow/frontend, but the codebase are built, distributed completely separatedly.
Yes these are two different code bases (kubeflow/metadata) and (kubeflow/pipelines) both which import MLMD Lineage from (kubeflow/frontend).
Ideally pipelines would not have showed the link to the artifact store when running in iframe mode (within central-dashboard). But currently both UIs display the same information
On Mon, May 11, 2020 at 4:28 PM Yuan (Bob) Gong notifications@github.com wrote:
They are two different servers. Some code is reused in kubeflow/frontend, but the codebase are built, distributed completely separatedly.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubeflow/metadata/issues/225#issuecomment-627019223, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIOV2QLGUOWLJJH6VOFG43RRCC27ANCNFSM4MSMVWDA .
-- — Yours Sincerely, Apoorv Verma.
Here's my understanding:
Google Metadata provides low level libraries for dealing with metadata
This repository (kubeflow/metadata) started as a way of providing a server (gRPC? REST? both) interface for metadata
This repository is also providing a python SDK to make it easy to log data to MLMD.
This repository is also providing a front end for visualizing the data
Per @avdaredevil 's comments above there are two versions of the front end
This repository is also defining some specific schemas that are defined using the ML Metadata data model.
So to summarize I think there several components
I think a major feature lineage tracking was introduced with 1.0
I think at 1.0 the front end changes were only in the KFP UI but have since been migrated into the standalone UI.
Do we have examples illustrating lineage tracking with and without pipelines?
Some clarifications on current status
This repository (kubeflow/metadata) started as a way of providing a server (gRPC? REST? both) interface for metadata
I believe KFP might have originally been using the the client libraries to talk directly to the DB but now it using this gRPC? REST? server
Google Metadata repo also provides a grpc server for interacting with metadata https://github.com/kubeflow/manifests/blob/master/metadata/base/metadata-deployment.yaml#L73 KFP standalone is also using this grpc server.
This repository made a REST server on top of it (I'm not sure about technical details, it could be a wrapper on metadata client or the grpc server.)
KFP UI and this repository are already reusing shared components in kubeflow/frontend for the lineage view, but not yet for lists. and both repos agree on the same schema
We do use metadata for some metrics tracking in non kfp projects. The reason this is more like a project for KFP is because we don't have experiment concept for other workloads, For example, user has to use SDK manually in their distributed training operator or notebook to log params or metrics. Visualization is limited as well. Even the adoption of this project is not high at this moment, I hope to have it separate and well designed.
it will become more important once we have generic experiment concepts across kubeflow project. I would say it's key project for MLOPS See related issue https://github.com/kubeflow/kubeflow/issues/4955
This repository (kubeflow/metadata) started as a way of providing a server (gRPC? REST? both) interface for metadata
- I believe KFP might have originally been using the the client libraries to talk directly to the DB but now it using this gRPC? REST? server
- @rmgogogo @Bobgy can one of you confirm the answer to that?
Current KFP is using google/metadata repo for MLMD stuff. https://github.com/google/ml-metadata
It's deployed as a gRPC server which connects with the DB. Pipeline tasks/steps calls the gRPC server to access data.
/cc @zhitaoli
/cc @aronchick
If I recall correctly this repository might have originally been providing the following functionality on top of TFX-metadata
Some of this functionality might no longer be needed I think tfx-metadata might support GRPC.
Regarding the UI; I believe as @avdaredevil mentioned above some of the frontend code has been refactored into reusable libraries in kubeflow/frontend. I'm unclear to what extent the KFP UI and metadata UIs have been updated to use those shared libraries.
/cc @zhenghuiwang
I think we don't have a replacement for these items:
The following might not be required any more
Regarding the UI; I believe as @avdaredevil mentioned above some of the frontend code has been refactored into reusable libraries in kubeflow/frontend. I'm unclear to what extent the KFP UI and metadata UIs have been updated to use those shared libraries.
Current status, kubeflow/pipelines is using those shared libraries entirely, I'm not sure about kubeflow/metadata.
@neuromage @Bobgy I thought KFP was defining some higher level schemas?
I filed #250 to get rid of the standalone metadata UI. Its lagging behind the KFP metadata UI and noone seems to be maintaining.
Regarding the SDK; I stumbled upon https://www.tensorflow.org/tfx/ml_metadata/api_docs/python/mlmd/metadata_store/MetadataStore
/kind feature
What is the future of metadata deployment?
There are currently at least two variants of metadata
I think the differences might pertain mostly to the UI. I think KFP ships a UI for metadata integrated into the KFP UI but I think the backend might be the same.
I think the net effect is that a lot of development is happening in the KFP UI and the generic metadata UI is lagging behind; e.g. #217 is tracking upstreaming changes for lineage that are in KFP UI but not metadata UI.
I think metadata is largely based on mlmd which is developed in google/ml-metadata
What's the path forward for providing a metadata story?
/cc @neuromage @Bobgy @rmgogogo @avdaredevil @zhenghuiwang