kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.48k stars 439 forks source link

Move all suggestion algorithms to python #407

Closed jdplatt closed 5 years ago

jdplatt commented 5 years ago

What do you think of moving all of the code in pkg/suggestion used to generate suggestions to python? Right now three of the algorithms (grid, random, hyperband) are in go while two (rf + gp) are in python. I think the advantages of a single language would be:

  1. It would open up the option of using combinations of the algorithms. For example, with the the gp algorithm it is common to use something like random search for the burn-in phase at the start of an experiment when there is not enough data to train an accurate model. GPs also don't scale well to a high number of observations, which is why the vizier paper transitioned to a local search algorithm for high N. Finally, while the original hyperband paper uses random search to generate suggestions for each bracket, they highlight the algorithm can be combined with other suggestion algorithms such as a latin hypercube or ones such as GP that adapt as they see more examples.

  2. We only need to manage a single service rather than one for each of the go algorithms and a final one for all of the python algorithms. Having one service per algorithm will make it hard to scale to a wide range of algorithms. It will also probably be a little easier to develop + test if it's all the in the same language.

I'm happy to take the lead on this if you guys think it makes sense. Grid search and random search shouldn't take too long though hyperband may be a bigger lift.

@YujiOshima @richardsliu

YujiOshima commented 5 years ago

Thanks @jdplatt.

I don’t want to merge suggestion services because of robustoness for whole system. In currently archtecture, a failure of a service won't affect other services. And user can add thier custom algorithm service without restarting any services.

But we have big motivation to move python. We can reuse code in several algorithm and enrich algorithm with many liblaries in python.

jdplatt commented 5 years ago

That makes sense. I'll aim to migrate the suggestion code to python but make sure it's possible to run each of the suggestion algorithms in a separate service.

richardsliu commented 5 years ago

Makes sense to me. @gaocegege @johnugeorge

johnugeorge commented 5 years ago

@YujiOshima What failures do you anticipate?

With the current design, I see the advantage of adding custom algorithm service without restarting the service. However, I agree with @jdplatt Point 2. I feel that It is not a scalable solution to add wide range of algorithms. Eg: In https://github.com/tobegit3hub/advisor, 17 algorithms are listed which would need 17 pods in our case. What do you think?

jdplatt commented 5 years ago

The long list of algorithms in #15 and https://github.com/tobegit3hub/advisor was what led me to suggest merging all of the suggestion services into a single service. I definitely see the benefit of users being able to add a custom algorithm without needing to recreate the deployment. However, if every algorithm runs in it's own pod it will require the user to do a lot of extra work to test a new algorithm. They will need to i) create a new dockerfile, ii) create new deployment + service manifests, iii) update the build + deploy + test scripts. If we run all the algorithms inside a single deployment then the user will just have to create the new algorithm, and can lean on the existing docker + k8s machinery.

jdplatt commented 5 years ago

To be clear I don't think we need to decide on this now so I'll keep everything separate.

richardsliu commented 5 years ago

@jdplatt Can you do a quick write up of the steps needed to implement a new algorithm after your changes? It would help others understand the issue and the PR. Thanks!

richardsliu commented 5 years ago

This is done now. /close

k8s-ci-robot commented 5 years ago

@richardsliu: Closing this issue.

In response to [this](https://github.com/kubeflow/katib/issues/407#issuecomment-501527818): >This is done now. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.