Closed lluunn closed 5 years ago
Both examples focus on different techniques: distributed training and GPU serving. In the case of the example in object_detection, object detection appears to be an arbitrary choice. Using a different dataset or approach would not affect the ability to highlight distributed training. For #154, is that different? Would a different dataset or approach make sense for GPU serving?
I don't see strong justification for an additional E2E example to highlight distributed training since it could have been added to an existing E2E example instead with less effort. It doesn't make sense at this point to have two separate object detection examples, so I'm very much in favor of combining. We can highlight both distributed training and GPU serving, but they obvs need to use the same dataset.
As-is, there are a lot of manual steps in object_detection. My preference is to optimize for a single example with clean, straightforward instructions. #154 is closer to that than object_detection. What would it look like if we added distributed training to the model used in #154? Is that a published example in the same repo as the model? If not, let's smooth out the process in object_detection and add GPU serving to that approach.
If we absolutely want to keep them separate, #154 could be filed under Component-focused. If we do that, we should beef up the serving instructions in object_detection since they're pretty sparse.
In #154 , the model is also an arbitrary choice. It just demos well (can see the boxes on the image as the result), and highlights GPU serving. So on a second thought, why make it an e2e example if we want to highlight GPU serving?
And why do we get from object detection example given github issue example already has distributed training?
@jlewi @texasmichelle WDYT?
To illustrate serving with GPUs we need a model for which using GPUs make sense. So an image model is an obvious choice.
The GH summarization example problem isn't a good choice. We are currently using Keras to serve it, its text data using RNN. So probably not a good choice for illustrating GPUs with TF serving.
Per #145 this is based on this blog post https://cloud.google.com/blog/big-data/2017/09/performing-prediction-with-tensorflow-object-detection-models-on-google-cloud-machine-learning-engine
Which also does training. So I think we can get this working with training pretty easily.
Actually it looks like both this example and the example in object_detection are using the same Oxford-IIIT Pets dataset.
So I think we can put these two pieces together to have a complete E2E example of training.
@texasmichelle How about this
GitHub Issue Summarization
Object Detection use this to
/cc @royxue @ldcastell
I agree with the idea to combine these 2 parts together, it could make this example looks like a complete workflow.
Object detection provide detailed steps from create pvc to training, but maybe just too many yamls, it's better to reduce yaml numbers or using ksonnet as mentioned in #178
I am also working on an example using object detection for batch prediction. I don't have a strong opinion about if we should have training, TF-serving, and batch prediction in the same example workflow in this particular example. In some cases, users might just use existing models , such as from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md.
Also note that batch prediction doesn't need the hack to solve the "version" problem mentioned in https://github.com/kubeflow/examples/blob/master/object_detection/export_tf_graph.md.
Agree on having consolidated and cleaner examples. We can easily incorporate GPU (for training and serving) into the existing object_detection example once all the yamls be moved to a ksonnet app/prototype (#178). Same with batch predict
P.S sorry about all the yamls but since I'm just learning ksonnet for me it was faster/easier to use the yamls
That all sounds good to me. I updated #175 to reflect the removal of t2t training.
Since it's valuable to have a t2t example, we can replace it with the code we've been using for onstage demos. #191 created for this.
Can this be closed?
I'm going to close this issue. I reread the thread and looked at the current code and I don't see any immediate work.
IUUC I think #154 added a TFServing example and may not have initially been using the model produced by the training code. But it looks like that the instructions now tell users they can use the model they trained. https://github.com/kubeflow/examples/blob/master/object_detection/tf_serving_gpu.md
I think the next step is to open up separate issues like #231 to add E2E tests to verify we can train a model and then serve it.
We have a object detection example for distributed training https://github.com/kubeflow/examples/tree/master/object_detection and GPU serving, #154
They currently are using different models, but we should combine them into one example.