Open votti opened 1 year ago
I think I may give this a go - I would try to build this in Python analogous to the tfevent-metricscollector. Does this sound like a reasonable approach? I am also happy for any other suggestion.
Small update: I have now a metrics collector for kubeflow v1 pipelines that I think should work and according to the logs already manages to caputre the pipeline metrics artifacts(modeled after tfevent-metricscollector).
What I am failing is to pass the current trial name to the custom connector in the metricsCollectorSpec
Essentially I am using the a very similar configuration as in the custom connector example here:
https://github.com/kubeflow/katib/blob/master/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml#L13-L35
My cli metricscollector takes an argument "-t" or "--trial_name" with the trial name to use for reporting (exactly as the tfevent-metricscollector). Would maybe someone know a hint how to configure this such that the current trial-name would be passed as arg?
I am now really a bit confused:
Reading the source code of the metrics collector sidecar injection inject_webhook
, it looks to me as if the trial name should be actually added to the args:
https://github.com/kubeflow/katib/blob/22b740802a06d8926255b204076837d6e344ebb9/pkg/webhook/v1beta1/pod/inject_webhook.go#L302
Yet looking at the pods Katib creates, all these arguments seem to be missing. Is there anything I do not see?
My current section to specify the metrics collector:
metricsCollectorSpec:
source:
fileSystemPath:
path: "/tmp/outputs/mlpipeline_metrics/data"
kind: File
collector:
customCollector:
image: votti/kfpv1-metricscollector:v0.0.7
imagePullPolicy: Always
name: custom-metrics-logger-and-collector
kind: Custom
Which creates a specification as:
- image: votti/kfpv1-metricscollector:v0.0.7
imagePullPolicy: Always
name: custom-metrics-logger-and-collector
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp/outputs/mlpipeline_metrics
name: metrics-volume
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-rnmkw
readOnly: true
Thank you for working on this @votti! Would it be easier to use push-based metrics collector for such use-cases (ref: https://github.com/kubeflow/katib/issues/577)? Then we don't even need a sidecar to collect metrics.
cc @johnugeorge @gaocegege @tenzen-y
I now managed to implement a working metrics collector for Kubeflow Pipeline V1 Metrics artifacts: https://github.com/d-one/katib/tree/feature/kfpv1-metricscollector/cmd/metricscollector/v1beta1/kfpv1-metricscollector
For a full example how this is used see: https://github.com/votti/katib-exploration/blob/main/notebooks/mnist_pipeline_v1.ipynb
@Push: I think it is an interesting idea to build a dedicated KubeflowPipeline component that can push metrics to Katib
.
Challenges I see here is how to pass the current trial_name
. Otherwise the component could be built quite similar to the kfpv1-metricscollector
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hello, any update for KFP v2?
Cheers!
@AlexandreBrown We've worked on Katib + KFP example in this PR: https://github.com/kubeflow/katib/pull/2118 Any help and review for this PR are appreciated!
@AlexandreBrown We've worked on Katib + KFP example in this PR: https://github.com/kubeflow/katib/pull/2118 Any help and review for this PR are appreciated!
Great to see progress, was this PR made for kfp v2 or only v1?
Great to see progress, was this PR made for kfp v2 or only v1?
That PR is only for v1.
@AlexandreBrown
This is based on V1 as I only managed to compile the pipeline in KFP V1 as an Argo
Workflow manifest.
If there is a way to export KFP V2 as Argo
workflow it should be straightforward to use V2 as well.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/lifecycle frozen
/kind feature
Describe the solution you'd like Currently a aim is to do parameter tuning over pipelines in katib (#1914, #1993).
Kubeflow pipelines allow for dedicated metrics artifacts: https://kubeflow-pipelines.readthedocs.io/en/master/source/dsl.html?h=metrics#kfp.dsl.Metrics https://www.kubeflow.org/docs/components/pipelines/v1/sdk/pipelines-metrics/
Having a dedicated Katib sidecar metrics collector that collects the metrics from this artifacts, would make pipelines and katib work together quite nicely.
The current workaround is to use the stdout collector, but this causes issues with the complex commands in pipeline components (#1914, will add dedicated issue soon).
Anything else you would like to add:
Love this feature? Give it a 👍 We prioritize the features with the most 👍