tobegit3hub / simple_tensorflow_serving

Generic and easy-to-use serving service for machine learning models
https://stfs.readthedocs.io
Apache License 2.0
757 stars 193 forks source link

default is cpu mode? #26

Closed Johnson-yue closed 6 years ago

Johnson-yue commented 6 years ago

Hi, it's me again. I realized if using tf serving should be used python27 for tensorflow and tf serving . In your docker image , I see the tensorflow was installed with pypi and it is cpu mode. When I run CNN model such as resnet_50 , it is so slow for predicting!. Then I uninstall tensorflow and install tensorflow-gpu==1.10.1 with pip. anything is not change !! it is still slow!!.

My question is : Does the default mode is CPU? how can I run my model with GPU mode in your docker

Johnson-yue commented 6 years ago

I'm using your new docker image . I'm very happy to look at the update .

I run the image is tobegit3hub/simple_tensorflow_serving:latest-gpu When I load my model(resnet_50) like : image

I think my model is ok ,and your serving is no error: image

Then , using your demo to generate client file: curl http://localhost:8500/v1/models/ __resnet_50__ /gen/client?language=python > client.py

The client.py was generated!!

But When I run python client.py it cost a lot of time and nothing output.So I check the my GPU memory usage it is not increase!!!! It means the model not use GPU, would you please check it and provide a GPU model for test? Thank you

tobegit3hub commented 6 years ago

The answer is No and you can use GPU with tobegit3hub/simple_tensorflow_serving:latest-gpu.

It would be the problem of the docker command if you want to use GPUs in docker container. You can run simple_tensorflow_serving without docker or run with these parameters to mount gpu devices.

export CUDA_SO=$(\ls /usr/lib/x86_64-linux-gnu/libcuda.* | xargs -I{} echo '-v {}:{}')
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run -it -p 8500:8500 $CUDA_SO $DEVICES tobegit3hub/simple_tensorflow_serving:latest-gpu
Johnson-yue commented 6 years ago

@tobegit3hub Maybe I forget mount gpu device , I will try this . Thank you

tobegit3hub commented 6 years ago

I have updated the README and test successfully in our environment.

You can put all CUDA files in /usr/cuda_files/ and test with these commands. Otherwise, you can put your environments variables and make sure the docker image can access GPU and CUDA.

export CUDA_SO="-v /usr/cuda_files/:/usr/cuda_files/"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
export LIBRARY_ENV="-e LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/cuda_files"

docker run -it -p 8500:8500 $CUDA_SO $DEVICES $LIBRARY_ENV tobegit3hub/simple_tensorflow_serving:latest-gpu
Johnson-yue commented 6 years ago

@tobegit3hub I update your docker image : tobegit3hub/simple_tensorflow_serving:latest-gpu. It work but there are some bugs :

  1. tf-serving -gpu default is allocation ALL GPU Memory it was so terrible!!! look here , and luckly some have fixed it fixed code, Can you add these config and re-compile the TF-Serving. By the way , the method is _per_process_gpu_memoryfraction but I think allow_growth=True but I do not know how to do it.

  2. I test the half_plus_one model it is demo in tf-serving docker, it is work fine in tf-serving:latest-gpu docker image but, it is not working in your docker image(latest-gpu), My way is : 1) copy saved_model_half_plus_two_gpu into /simple_tensorflow_serving/models/ 2) /simple_tensorflow_serving/models/ | 00000123 | ----- | saved_model.pb | ------- | variables | -----------| variables.data-00000-of-00001 | -----------| variables.index 3) I run simple_tensorflow_serving --model_base_path="./models/saved_model_half_plus_two_gpu/" ERROR : IOError: SavedModel file does not exist at: ./models/saved_model_half_plus_two_gpu/123/{saved_model.pbtxt|saved_model.pb} why???

Do you limited model format?? image image

tobegit3hub commented 6 years ago

Thanks for reporting. The actual path should be ./saved_model_half_plus_two_cpu/00000123 but it seems to lookup the ./saved_model_half_plus_two_cpu/123.

I will open another issue to track this.

To support limit GPU usage, we don't need to rely or compile TensorFlow Serving. It is possible to add per_process_gpu_memory_fraction as param of TensorFlow Python server.