jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

How to deal with parallel processing in production #392

Open dovanchan opened 7 years ago

dovanchan commented 7 years ago

Hi: Here I got a problem. As we know,the best algorithm in neural-style can transfer a photo in 30s when using the best gpu; But it will take up more than 500M(with nin model) gpu memory. Now ,the best Gpu with high memory is K80s with 24G. So parallel processing is really a big problem. If we try to use Aws,it's not enough to deal with these parallel .What I know is Google Cloud Platform provide the CLOUD MACHINE LEARNING ENGINE servers(I think is really cheap for DL),But I dont know if it can use for neural-style. Do you have any good Suggestions for this parallel processing problem? or share some solutions that we can explore it together

ajhool commented 7 years ago

For those unfamiliar, @dovanchan is referring to these product on AWS and GCP, which are managed machine learning environments: https://aws.amazon.com/machine-learning/details/ https://cloud.google.com/ml-engine/

Although I believe that the aws product is restricted to specific model types, and I don't think neural-style can be shoehorned in: http://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.html

The GCP version runs tensorflow and there are a few tensorflow implementations on github, although I'm not sure where exactly the training and prediction can be delineated in this algorithm.

I am curious @dovanchan, where did you find these benchmarks? Can anybody else confirm that 500M is approximately the gpu memory usage? Therefore a K80 @ 24G could run approximately 45 style transfers in parallel and complete each in about 30 seconds? Reasonable approximations?

dovanchan commented 7 years ago

approximately 1G with Vgg model,but if you try Nin model and a small frame implement such as chainer,refer this( https://github.com/pfnet-research/chainer-gogh ),the gpu usage can be 500M. I also don't know if the GCP is useful for this algorithm. So I hope someone can share his experience for this parallel processing when product