kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.51k stars 443 forks source link

* refactor(sdk): added option for custom metric collector for tune in… #2406

Open prakhar479 opened 3 months ago

prakhar479 commented 3 months ago

… katlib_client.py

Signed-off-by: prakhar479 153047595+prakhar479@users.noreply.github.com

added custom_collector field to metrics_collector_config in tune to allow for users to specify custom metrics collector for example prometheus

fixes #2402

google-cla[bot] commented 3 months ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-oss-prow[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubeflow/katib/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
prakhar479 commented 3 months ago

@Electronic-Waste can you review and let me know if any changes are required in this. Thanks a lot!

Electronic-Waste commented 3 months ago

@prakhar479 Yes, of course. Thanks for your great effort to Katib!

I'll look into this PR in the next few days.

andreyvelich commented 3 months ago

/rerun-all

prakhar479 commented 3 months ago

I have corrected some oversights from my side and need approval for testing. Thanks!

Electronic-Waste commented 3 months ago

/rerun-all

Electronic-Waste commented 3 months ago

PTAL👀 @andreyvelich @tenzen-y @johnugeorge when you have time.

prakhar479 commented 3 months ago

I have modified the comment on usage of the custom metric param as well e2e test for tune Api. For e2e test I have currently modified build-load.sh as suggested by @Electronic-Waste to build an image of custom metric using dummy-collector.py script and Dockerfile.dummy-collector file for building image for dummy collector container. Finally, I have modified run-e2e-tune-api.py adding the custom collector image as a V1 Container passed as a param to tune Api.

I was a bit confused with placement for these new files and have placed all of them in gh-action directory. Let me know for any modifications, changes and fixes I need to make further.

Electronic-Waste commented 2 months ago

/rerun-all

prakhar479 commented 2 months ago

I have made the neccesary changes that should also solve failing tests. Let me know about any further changes/suggestions @Electronic-Waste @andreyvelich @tenzen-y @johnugeorge.

prakhar479 commented 2 months ago

@Electronic-Waste I have fixed these issues lmk if anything else is needed. thanks!

Electronic-Waste commented 2 months ago

/rerun-all

Electronic-Waste commented 2 months ago

@prakhar479 Can you please fix the lint error and the error in tune API?

Electronic-Waste commented 2 months ago

/rerun-all