kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.49k stars 440 forks source link

Replace gRPC code generation tool from Znly/protoc to Buf #2344

Closed forsaken628 closed 3 months ago

forsaken628 commented 3 months ago

What this PR does / why we need it: Replace gRPC code generation tool from Znly/protoc to Buf , and some fix

Which issue(s) this PR fixes _(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged)_: Fixes #2141

Checklist:

forsaken628 commented 3 months ago

medianstop doesn't have any unit tests, and it doesn't look like there's any reason it has to be implemented in python. Can I re-implement it in go later and add unit tests? @andreyvelich

andreyvelich commented 3 months ago

medianstop doesn't have any unit tests, and it doesn't look like there's any reason it has to be implemented in python. Can I re-implement it in go later and add unit tests?

Actually, we have unit tests here: https://github.com/kubeflow/katib/blob/master/test/unit/v1beta1/earlystopping/test_medianstop_service.py

Any specific reason do you want to implement in Go, ? We are trying to keep our optimization algorithms in Python since Data Science community are more familiar with it.

tenzen-y commented 3 months ago

Thank you for this great contribution @forsaken628! I left a few comments. @tenzen-y @johnugeorge Should we try to merge it before the release to unblock: #2346 ?

SGTM

tenzen-y commented 3 months ago

@forsaken628 As I mentioned here, could you open a PR to implement the custom readiness check to resolve flake E2E tests?

google-oss-prow[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/katib/blob/master/OWNERS)~~ [tenzen-y] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
forsaken628 commented 3 months ago

@forsaken628 As I mentioned here, could you open a PR to implement the custom readiness check to resolve flake E2E tests?

https://github.com/kubeflow/katib/blob/e6bd3e7b5bd95e0e77ce2f869e6b1eeb72d24180/cmd/katib-controller/v1beta1/main.go#L151

There's already a webhook ready check added before manger start, and I'm wondering why it's still getting the connection refused

tenzen-y commented 3 months ago

@forsaken628 As I mentioned here, could you open a PR to implement the custom readiness check to resolve flake E2E tests?

https://github.com/kubeflow/katib/blob/e6bd3e7b5bd95e0e77ce2f869e6b1eeb72d24180/cmd/katib-controller/v1beta1/main.go#L151

There's already a webhook ready check added before manger start, and I'm wondering why it's still getting the connection refused

As I described in the comment, the default checker can not consider if the certs are ready.

https://github.com/kubeflow/training-operator/blob/9a9edecf146bb0c2cccaeb3bec912ab2dfa5d8c5/cmd/training-operator.v1/main.go#L230-L236

andreyvelich commented 3 months ago

Thank you for doing this @forsaken628! Please can you cherry-pick this PR to the release branch: release-0.17 ?

andreyvelich commented 3 months ago

/lgtm