[Agents RL] Demonstrate Kubeflow with an E2E RL example

cwbeitel commented 6 years ago

The purpose of this example is to showcase the benefits of the Kubeflow infrastructure in training a reinforcement learning agent.

Core tasks:

[ ] Case study described that communicates the business value of the example; who will care about this example and why?
[x] Illustrate the config, submit, monitor, render workflow for single-node training
[ ] Prow test verifies model trains in notebook container
[ ] Illustration of practice for building and pushing containers efficiently
[ ] Distributed training with TFJob operator (e.g. using @danijar's idea)
[ ] Illustration of simple hyperparameter tuning
[ ] Uses accelerators

Optionally:

[ ] Build a custom gym environment that captures a business problem of interest, e.g. reinforcement learning in the context of datacenter cooling, scheduling, hyperparameter tuning, etc.
[ ] Deploy the agent and custom environment, e.g. if this environment concerns kubernetes scheduling then use it to schedule resources on a cluster and measure whether there was a benefit

/cc @nkashy1 @danijar @aronchick @jlewi

cwbeitel commented 6 years ago

Thinking more I think the steps under testing should be completed before reviewing https://github.com/kubeflow/examples/pull/1

cwbeitel commented 6 years ago

kubeflow / examples