yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 357 forks source link

Can CaffeOnSpark used for deployment? #248

Closed baoruxiao closed 7 years ago

baoruxiao commented 7 years ago

Hi,

I have minor knowledge on spark and recently am exporing CaffeOnSpark framework.

I notice that caffeonspark read dataframe as datasource, and do some partitions on it and distribute partitions to executors. What if datasource contains single image? I wonder if caffeonspark can read single image at a time and predict its label in a distributed manner? Namely, can I speed up predicting single image's label using caffeonspark clusters?

I have intention to use framework for DL model deployment, but this question lingered around and I'm reaching out for insights.

Many thanks!!

junshi15 commented 7 years ago

If a dataset contains only one image, then only one partition will have an image, the rest are empty.

You can set batch size = 1 during inference (prediction) stage. This is actually recommended. If the batch size does not divide total dataset size, the reminder will be discarded as of current implementation. This is OK in training, but not desirable in inference. So at inference, you may want to set the batch size to 1, or make sure the batch size divides the total number of images.

baoruxiao commented 7 years ago

@junshi15 Thanks, this really answers my question. Based on your answer, if there is one partition, the inference performance will not horizontal scale up with increasing of executors? My use case is: taking video from web cam and feeding frame one after another to network for inference, the batch size is always 1. I want to scale the performance up using Spark clusters, am I able to use CaffeOnSpark in this case?

Thanks!

junshi15 commented 7 years ago

You want to do on-line (maybe even real-time) inference. CaffeOnSpark does not quite meet your requirement.

A typical use case for CaffeOnSpark is as follows. You have a ton of images sitting on HDFS, you want to predict all of them. You launch CaffeOnSpark, say with N executors, the dataset gets cut into N partitions. Each executor work on one partition. As you can see, this is a batch operation. You don't change the number of executors during prediction. It's fixed for the entire process.

baoruxiao commented 7 years ago

@junshi15 This finally confirmed my speculation. It seems no available clustering-solution to do on-line inference currently. Maybe virtualization like Mesos is a way to go. Thanks! I will close this issue.

junshi15 commented 7 years ago

You may want to look into tensorflow serving, if serving is your goal.

baoruxiao commented 7 years ago

@junshi15 Yes, serving/production is my goal. Seems really what I want, many thanks!!