tobegit3hub / simple_tensorflow_serving

Generic and easy-to-use serving service for machine learning models
https://stfs.readthedocs.io
Apache License 2.0
757 stars 193 forks source link

About STFS-gpu Performance #30

Open Johnson-yue opened 6 years ago

Johnson-yue commented 6 years ago

Hi, my model is work and every is ok. thank you. But in my test case, I found some bugs.

1) Usage of GPU memory:

My model is resnet-50 and I was set session_config flag such as "log_device_placement": true, "allow_soft_placement": true, "allow_growth": true , I do not use "per_process_gpu_memory_fraction": 0.5 because it will limit 50% gpu memory use whether your model is small or big. So, when I start a serving at first time. the usage of gpu memory is 340+MB, I think it is reasonable. but when I run client code once or more. the usage of gpu memory growth untill 7.4GB. I do not know why, do you check it?

2) cost time of inference:

I test my model with frozen.pb on Session mode , the cost time is about 6-7ms of sess.run(). But when I deploy this model on STFS with gpu, it cost 40ms once! I think you have checked your STFS performace with other deploy framework. but , do you compare the cost time of Session.run() with TF-Serving run??

tobegit3hub commented 6 years ago

For the first question, the usage of GPU dependencies on your model and the batch size. The model may be only 340MB but one of its operation like matrix multiplication needs more GPU memory to hold the intermediate variables.

For the second question, you can run the server with --log_level=debug and it will print the time of sess.run(). SimpleTensorFlowServing will cost more time to process and response HTTP requests but it should not be much slower. We have the performance tests with TensorFlow Serving and the time is close the sess.run() with TensorFlow Python APIs.