kubeflow / metadata

Repository for assets related to Metadata.
Apache License 2.0
121 stars 69 forks source link

Future of metadata development and deployment - standalone? Only as part of KFP #225

Closed jlewi closed 2 years ago

jlewi commented 4 years ago

/kind feature

What is the future of metadata deployment?

There are currently at least two variants of metadata

I think the differences might pertain mostly to the UI. I think KFP ships a UI for metadata integrated into the KFP UI but I think the backend might be the same.

I think the net effect is that a lot of development is happening in the KFP UI and the generic metadata UI is lagging behind; e.g. #217 is tracking upstreaming changes for lineage that are in KFP UI but not metadata UI.

I think metadata is largely based on mlmd which is developed in google/ml-metadata

What's the path forward for providing a metadata story?

/cc @neuromage @Bobgy @rmgogogo @avdaredevil @zhenghuiwang

rmgogogo commented 4 years ago

quick question, here "metadata deployment" means google/ml-metadata or KF metadata?

If google/ml-metadata, then it's already in KFP (google/mlmd provided a gRPC server). I think we don't have plan/resource to continue the KF metadata, right? I may lack of knowledge/info on the context.

Bobgy commented 4 years ago

I think https://github.com/kubeflow/metadata/issues/217#issuecomment-595625018 is the only planned future feature. Other than that, keeping current status and asking for community help is the best I can forsee.

animeshsingh commented 4 years ago

If we don't plan to extend it, and it becomes redundant with google/ml-metadata, and that's whre KFP is focussed, best to bring it up in community meeting to decide on the future

jlewi commented 4 years ago

If #217 is the only planned feature then what does this mean for creating a generic metadata story?

When people deploy Kubeflow pipelines do they get:

  1. A metadata backend that can be used to record metadata from arbitrary services (not just KFP)
  2. A UI for displaying metadata even if it wasn't created by KFP.

/cc @paveldournov

Bobgy commented 4 years ago

With current status, both 1. and 2. mentioned above are true.

If we have bandwidth, it's better metadata UI is keeping maintained separately, but I don't think that's the case now.

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
area/front-end 0.63

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

jlewi commented 4 years ago

I believe when you install the full KFP there are two different URLs for the UI for the metadata store

  1. https:/${KFENDPOINT}/_/metadata
  2. https://${KFENDPOINT}/_/pipeline/#/artifacts

Are these both pointing at the same UI service or are they two different servers?

I suspect they are two different servers but I could be wrong.

Bobgy commented 4 years ago

They are two different servers. Some code is reused in kubeflow/frontend, but the codebase are built, distributed completely separatedly.

avdaredevil commented 4 years ago

Yes these are two different code bases (kubeflow/metadata) and (kubeflow/pipelines) both which import MLMD Lineage from (kubeflow/frontend).

Ideally pipelines would not have showed the link to the artifact store when running in iframe mode (within central-dashboard). But currently both UIs display the same information

On Mon, May 11, 2020 at 4:28 PM Yuan (Bob) Gong notifications@github.com wrote:

They are two different servers. Some code is reused in kubeflow/frontend, but the codebase are built, distributed completely separatedly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubeflow/metadata/issues/225#issuecomment-627019223, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIOV2QLGUOWLJJH6VOFG43RRCC27ANCNFSM4MSMVWDA .

-- — Yours Sincerely, Apoorv Verma.

jlewi commented 4 years ago

Here's my understanding:

So to summarize I think there several components

jlewi commented 4 years ago

I think a major feature lineage tracking was introduced with 1.0

Bobgy commented 4 years ago

Some clarifications on current status

This repository (kubeflow/metadata) started as a way of providing a server (gRPC? REST? both) interface for metadata

I believe KFP might have originally been using the the client libraries to talk directly to the DB but now it using this gRPC? REST? server

Google Metadata repo also provides a grpc server for interacting with metadata https://github.com/kubeflow/manifests/blob/master/metadata/base/metadata-deployment.yaml#L73 KFP standalone is also using this grpc server.

This repository made a REST server on top of it (I'm not sure about technical details, it could be a wrapper on metadata client or the grpc server.)

KFP UI and this repository are already reusing shared components in kubeflow/frontend for the lineage view, but not yet for lists. and both repos agree on the same schema

Jeffwan commented 4 years ago

We do use metadata for some metrics tracking in non kfp projects. The reason this is more like a project for KFP is because we don't have experiment concept for other workloads, For example, user has to use SDK manually in their distributed training operator or notebook to log params or metrics. Visualization is limited as well. Even the adoption of this project is not high at this moment, I hope to have it separate and well designed.

it will become more important once we have generic experiment concepts across kubeflow project. I would say it's key project for MLOPS See related issue https://github.com/kubeflow/kubeflow/issues/4955

rmgogogo commented 4 years ago

This repository (kubeflow/metadata) started as a way of providing a server (gRPC? REST? both) interface for metadata

  • I believe KFP might have originally been using the the client libraries to talk directly to the DB but now it using this gRPC? REST? server
  • @rmgogogo @Bobgy can one of you confirm the answer to that?

Current KFP is using google/metadata repo for MLMD stuff. https://github.com/google/ml-metadata

It's deployed as a gRPC server which connects with the DB. Pipeline tasks/steps calls the gRPC server to access data.

rmgogogo commented 4 years ago

https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/base/metadata

FYI. It uses this binary to access MLMD. https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/base/metadata/metadata-grpc-deployment.yaml#L19

jlewi commented 4 years ago

/cc @zhitaoli

jlewi commented 4 years ago

/cc @aronchick

jlewi commented 4 years ago

If I recall correctly this repository might have originally been providing the following functionality on top of TFX-metadata

Some of this functionality might no longer be needed I think tfx-metadata might support GRPC.

Regarding the UI; I believe as @avdaredevil mentioned above some of the frontend code has been refactored into reusable libraries in kubeflow/frontend. I'm unclear to what extent the KFP UI and metadata UIs have been updated to use those shared libraries.

/cc @zhenghuiwang

Bobgy commented 4 years ago

I think we don't have a replacement for these items:

The following might not be required any more

Regarding the UI; I believe as @avdaredevil mentioned above some of the frontend code has been refactored into reusable libraries in kubeflow/frontend. I'm unclear to what extent the KFP UI and metadata UIs have been updated to use those shared libraries.

Current status, kubeflow/pipelines is using those shared libraries entirely, I'm not sure about kubeflow/metadata.

jlewi commented 4 years ago

@neuromage @Bobgy I thought KFP was defining some higher level schemas?

jlewi commented 3 years ago

I filed #250 to get rid of the standalone metadata UI. Its lagging behind the KFP metadata UI and noone seems to be maintaining.

Regarding the SDK; I stumbled upon https://www.tensorflow.org/tfx/ml_metadata/api_docs/python/mlmd/metadata_store/MetadataStore