kubeflow / metadata

Repository for assets related to Metadata.
Apache License 2.0
121 stars 69 forks source link

K8s Watcher Should Not Create New Artifacts If Artifact Already Exists #248

Closed jlewi closed 2 years ago

jlewi commented 3 years ago

/kind feature

Follow on to #241 and #246. In that PR the watcher looks at annotations on Kubernetes resources and creates artifacts from them.

e.g suppose we have the job

apiVersion: batch/v1
kind: Job
metadata:
  generateName: watcher-job-
  annotations:
    "metadata.kubeflow.org/input": |
      {"apiVersion":"metadata.kubeflow.org/v1alpha1","kind":"Dataset","metadata":{"name":"fakeInput"},"spec":{"prop1":"fakevalue"}}
    "metadata.kubeflow.org/output": |
      {"apiVersion":"metadata.kubeflow.org/v1alpha1","kind":"Dataset","metadata":{"name":"fakeOutput"},"spec":{"prop1":"fakeoutput"}}
spec:
   ...

The watcher will create MLMD artifacts corresponding to the input and output annotations.

Right now the watcher doesn't check if artifacts corresponding to a given name already exist. If it does then we should reuse those artifacts rather than creating new artifacts. This way the lineage graphs will make it easy to see that a given artifact is being reused across executions.

/cc @Swikar @karlschriek