kubeflow / fairing

Python SDK for building, training, and deploying ML models
Apache License 2.0
337 stars 144 forks source link

update fairing/lightgbm retries to scale with num_machines #532

Closed aakarshg closed 4 years ago

aakarshg commented 4 years ago

What this PR does / why we need it:

This allows for fairing lightgbm wrapper, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #529

Release note:

Updated lightgbm wrapper's nslookup to wait based on the size of machines, still retains the previous functionality of waiting for 600 seconds when the num_machines is smaller than 500, for cases where it's larger the maximum wait is (num_machines*1.2) seconds. 
kubeflow-bot commented 4 years ago

This change is Reviewable

k8s-ci-robot commented 4 years ago

Hi @aakarshg. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
xauthulei commented 4 years ago

/ok-to-test

xauthulei commented 4 years ago

/lgtm

Meanwhile, please @jinchihe to review this again ,Thanks

jinchihe commented 4 years ago

Look great! Thanks @aakarshg /approve

k8s-ci-robot commented 4 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jinchihe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/fairing/blob/master/OWNERS)~~ [jinchihe] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment