Closed maksna closed 3 years ago
@maksna looks like the limitation is the lack of support for a specific op / format by Tensorflow. Can you try to run the same in CPU only Tensorflow? You should see the same error.
Closed due to inactivity. Will reopen if you continue to experience the error.
Description I use triton server to predict tebsorflow2 albert. The server is running sucessfully,but i post a grpc request,report a BUG
[StatusCode.INTERNAL] [Derived]{{function_node __inference_albert_fine_tune_layer_call_and_return_conditional_losses_9401}} {{function_node __inference_albert_fine_tune_layer_call_and_return_conditional_losses_9401}} The CPU implementation of FusedBatchNorm only supports NHWC tensor format for now. [[{{node layer_normalization/FusedBatchNormV3}}]] [[StatefulPartitionedCall/StatefulPartitionedCall/albert_fine_tune/StatefulPartitionedCall/StatefulPartitionedCall]]
Are you using the Triton container or did you build it yourself? container: nvcr.io/nvidia/tritonserver 21.07-py3
To Reproduce Steps to reproduce the behavior. 1.docker pull nvcr.io/nvidia/tritonserver:21.07-py3 2.produce config.pbtxt and tf.savedmodel files 3.docker run -it -v /path/to/models:/models -p 8000:8000 -p 8001:8001 -p 8002:8002 nvcr.io/nvidia/tritonserver:21.07-py3 bash 4 tritonserver --model-repository=/models 5.post a grpc request
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). models: tensorflow 2.3 albert_tiny.
my config.pbtxt platform: "tensorflow" max_batch_size: 8 input [ { name: "inputs" data_type: DT_INT8 dims: [ 2,256 ] } ] output [ { name: "outputs" data_type: DT_FLOAT dims: [ 256,205 ] } ]
Expected behavior I want know how to fix this bug . My model use triton-server gpus is normal. But when i use CPU , post request get errors.