Open Johnson-yue opened 6 years ago
For the first question, the usage of GPU dependencies on your model and the batch size. The model may be only 340MB
but one of its operation like matrix multiplication needs more GPU memory to hold the intermediate variables.
For the second question, you can run the server with --log_level=debug
and it will print the time of sess.run()
. SimpleTensorFlowServing will cost more time to process and response HTTP requests but it should not be much slower. We have the performance tests with TensorFlow Serving
and the time is close the sess.run()
with TensorFlow Python APIs.
Hi, my model is work and every is ok. thank you. But in my test case, I found some bugs.
My model is resnet-50 and I was set
session_config
flag such as"log_device_placement": true, "allow_soft_placement": true, "allow_growth": true
, I do not use"per_process_gpu_memory_fraction": 0.5
because it will limit 50% gpu memory use whether your model is small or big. So, when I start a serving at first time. the usage of gpu memory is 340+MB, I think it is reasonable. but when I run client code once or more. the usage of gpu memory growth untill 7.4GB. I do not know why, do you check it?I test my model with frozen.pb on Session mode , the cost time is about 6-7ms of sess.run(). But when I deploy this model on STFS with gpu, it cost 40ms once! I think you have checked your STFS performace with other deploy framework. but , do you compare the cost time of Session.run() with TF-Serving run??