Open kmkolasinski opened 6 months ago
@kmkolasinski, --xla_cpu_compilation_enabled=true
parameter should be passed as additional argument to enable XLA:CPU JIT (default is disabled).
Can you try creating a TF Serving docker container with additional parameters as shown in example and see if model inferencing works. Thank you!
Hi @singhniraj08, yes I tried this appraoch, please check out my docker (at CMD command) which I used for testing: https://github.com/kmkolasinski/triton-saved-model/blob/main/tf_serving/Dockerfile
Here is my docker compose which I used to run serving: https://github.com/kmkolasinski/triton-saved-model/blob/main/docker-compose.yml
# Firstly, use https://github.com/kmkolasinski/triton-saved-model/blob/main/notebooks%2Fexport-classifier.ipynb to export various classifiers
docker compose up tf_serving_server
I prepared this repository which reproduces this issue https://github.com/kmkolasinski/triton-saved-model/tree/main
Hi @YanghuaHuang did you have time to take a look at this issue ?
Sorry for the late reply. Assign to @guanxinq to triage, who has better knowledge on this.
Hi @YanghuaHuang thanks, I just wonder whether we can use XLA compiled models in TF Serving or not. If yes, how we can achieve it as I couldn't find any information about this.
I think tfs does support XLA CPU but not GPU. But I could be wrong.
@gharibian Hey Dero, can you help on this? Thanks!
Thanks for the answer. If this is a truth, the message
Could not find compiler for platform CUDA: NOT_FOUND
makes a perfect sense to me now and that's a pitty. I assumed that TF Serving is using the same C++ backend to run SavedModel graph as TF libraries, so any SavedModel which I can run via python code I can also run via TF Serving. Let's wait for the confirmation from @gharibian .
hey @gharibian did you have time to take a look at this thread ?
Bug Report
Does Tensorflow Serving support XLA compiled SavedModels ? or am I doing something wrong ?
System information
2.13.1-gpu
Describe the problem
Hi, I'm trying to run XLA compiled models via Tensorflow Serving, however it seems to not work for me.
Here is the notebook I used to create XLA/AMP compiled SavedModel of very simple classifiers like ResNet50 https://github.com/kmkolasinski/triton-saved-model/blob/main/notebooks/export-classifier.ipynb
When running the TFServing server I can see following warning in the console
I get similar message on the client side
Exact Steps to Reproduce
You can find my repo where I compare Triton Server (python backend) with TFServing here: https://github.com/kmkolasinski/triton-saved-model. In the notebooks directory you will find
Is this expected behavior ? I am aware of this flag
however, I was not able to find any reasonable resources on how to use it, to test my case.