kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.51k stars 442 forks source link

[SDK] Support More MCKind in `tune` #2402

Open Electronic-Waste opened 3 months ago

Electronic-Waste commented 3 months ago

What you would like to be added?

Support specifying more types of metrics collector in the tune function.

https://github.com/kubeflow/katib/blob/51b246fa1c23db130c373fa9749897a0d0b334ab/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L189

Why is this needed?

Currently, we only support kind param in metrics_collector_config, which means only StdOut and Push collector can be specified in tune function.

https://github.com/kubeflow/katib/blob/51b246fa1c23db130c373fa9749897a0d0b334ab/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L387-L391

However, there are many other collector types, such as File, Prometheus, Custom, TensorFlowEvent collector. They are important too. The tune function would be more flexible and powerful if users can specify all these kind of collectors.

Currently we support:

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Electronic-Waste commented 3 months ago

/remove-label lifecycle/needs-triage

/area sdk

google-oss-prow[bot] commented 3 months ago

@Electronic-Waste: The label(s) `/remove-label lifecycle/needs-triage

cannot be applied. These labels are supported:tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash. Is this label configured underlabels -> additional_labelsorlabels -> restricted_labelsinplugin.yaml`?

In response to [this](https://github.com/kubeflow/katib/issues/2402#issuecomment-2266666950): >/remove-label lifecycle/needs-triage > >/area sdk Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
Electronic-Waste commented 3 months ago

/good-first-issue

google-oss-prow[bot] commented 3 months ago

@Electronic-Waste: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to [this](https://github.com/kubeflow/katib/issues/2402): >/good-first-issue Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
prakhar479 commented 3 months ago

hey @Electronic-Waste I wanted to work on this issue and have some doubts regarding the same. Are the only modifications required in tune function to allow other function signature or does V1beta1CollectorSpec method from models class would also need some changes to handle these different collector types. Also, what is the specific representation (string key) for these other collector types (any example would be helpful). Thanks!

Electronic-Waste commented 3 months ago

Hi @prakhar479 , thank you for your passion for this issue!

I think V1beta1CollectorSpec is already up-to-date to handle different types of MC. So probably you only need to add some fields to metrics_collector_config and modify these lines of code in tune function.

https://github.com/kubeflow/katib/blob/b6f7cfd9a727b39f1ce56f0c7a605f4962b51de5/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py#L387-L391

Also, you may find these examples helpful to you :)

https://github.com/kubeflow/katib/blob/master/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml https://github.com/kubeflow/katib/blob/master/examples/v1beta1/metrics-collector/file-metrics-collector-with-json-format.yaml https://github.com/kubeflow/katib/blob/master/examples/v1beta1/metrics-collector/file-metrics-collector.yaml

cc👀 @kubeflow/wg-automl-leads

Electronic-Waste commented 3 months ago

/assign @prakhar479

prakhar479 commented 3 months ago

thanks for the helpful insight @Electronic-Waste. I wanted to confirm one doubt I encountered while going through the code. V1beta1CollectorSpec method has a custom_collector: Any | None = None field which is not being used currently in the initialization. can we set up extra fields in the metrics_collector_config for other types of collectors and then pass them in custom_collector field in line 390

prakhar479 commented 3 months ago

Also, can someone lead me to a guide to set up and run tests for the repo. I searched around and was able to get few things going but couldn't run e2e tests due to incomplete set up on my end. Thanks!

Electronic-Waste commented 3 months ago

can we set up extra fields in the metrics_collector_config for other types of collectors and then pass them in custom_collector field in line 390

@prakhar479 Yes, of course. You can add any fields you want in metrics_collector_config.

Also, can someone lead me to a guide to set up and run tests for the repo. I searched around and was able to get few things going but couldn't run e2e tests due to incomplete set up on my end. Thanks!

In fact, I run the e2e tests by submitting PRs to this repo. E2e tests will be triggered automatically in CI/CD. You can also consult @andreyvelich @tenzen-y @johnugeorge for more suggestions.

Electronic-Waste commented 1 month ago

/remove-label lifecycle/needs-triage