fp16 issue in 20.03 - Githubissues

loveppdog commented 4 years ago

I use triton 20.03 and our model is tensorflow_graphdef. And fp16 will result error when inference: Some info likes: platform: "tensorflow_graphdef" max_batch_size: 128

optimization { execution_accelerators { gpu_execution_accelerator : [ { name : "tensorrt" parameters { key: "precision_mode" value: "FP16" }}] }}

trt-gpu-serving-zxold-210-2003 | 2020-07-14 17:40:16.898946: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Setting layouts of network and plugin input/output tensors to linear, as 3D operators are found and 3D non-linear IO formats are not supported, yet. trt-gpu-serving-zxold-210-2003 | 2020-07-14 17:40:17.027031: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger ../rtSafe/safeContext.cpp (105) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.) trt-gpu-serving-zxold-210-2003 | 2020-07-14 17:40:17.028820: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger ../rtSafe/safeContext.cpp (105) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.) trt-gpu-serving-zxold-210-2003 | 2020-07-14 17:40:17.028903: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:748] Engine creation for depth_2_conv/TRTEngineOp_12 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine trt-gpu-serving-zxold-210-2003 | 2020-07-14 17:40:17.039095: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for depth_2_bn/cond/batchnorm/TRTEngineOp_11 input shapes: [[1,64,1,1,1], [1,64,1,1,1], [1,64,1,1,1], [1,64,24,24,16], [1,64,1,1,1]]

trt-gpu-serving-zxold-210-2003 | 2020-07-14 17:40:17.217267: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:748] Engine creation for depth_3_relu/TRTEngineOp_16 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine trt-gpu-serving-zxold-210-2003 | 2020-07-14 17:40:17.217987: E tensorflow/stream_executor/dnn.cc:594] CUDNN_STATUS_EXECUTION_FAILED trt-gpu-serving-zxold-210-2003 | in tensorflow/stream_executor/cuda/cuda_dnn.cc(4318): 'cudnnPoolingForward( cudnn.handle(), pooling_desc.handle(), &alpha, src_desc.handle(), input_data.opaque(), &beta, dest_desc.handle(), output_data->opaque())' trt-gpu-serving-zxold-210-2003 | I0714 09:40:17.218254 18 trtserver.cc:1677] Infer failed: 2 root error(s) found. trt-gpu-serving-zxold-210-2003 | (0) Internal: dnn PoolForward launch failed trt-gpu-serving-zxold-210-2003 | [[{{node max_pooling3d_5/MaxPool3D}}]] trt-gpu-serving-zxold-210-2003 | [[regression_output/Relu/_7]] trt-gpu-serving-zxold-210-2003 | (1) Internal: dnn PoolForward launch failed trt-gpu-serving-zxold-210-2003 | [[{{node max_pooling3d_5/MaxPool3D}}]] trt-gpu-serving-zxold-210-2003 | 0 successful operations. trt-gpu-serving-zxold-210-2003 | 0 derived errors ignored.

Zees1023 commented 4 years ago

I have the similar problem with you. My model is tensorflow_graphdef and on 20.03 I add the fp16 in conifg as follows:

optimization { execution_accelerators { gpu_execution_accelerator : [ { name : "tensorrt" parameters { key: "precision_mode" value: "FP16" }}] }}

and the errors:

gpu-serving-4.31-x | 2020-07-14 18:05:15.325414: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Setting layouts of network and plugin input/output tensors to linear, as 3D operators are found and 3D non-linear IO formats are not supported, yet. gpu-serving-4.31-x | 2020-07-14 18:05:15.487927: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger Internal error: could not find any implementation for node conv3d_4/convolution, try increasing the workspace size with IBuilder::setMaxWorkspaceSize() gpu-serving-4.31-x | 2020-07-14 18:05:15.490510: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger ../builder/tacticOptimizer.cpp (1523) - OutOfMemory Error in computeCosts: 0 gpu-serving-4.31-x | 2020-07-14 18:05:15.490639: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:748] Engine creation for TRTEngineOp_58 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine

tanmayv25 commented 4 years ago

Can you check your CUDA toolkit/driver installation? Make sure it satisfies the version requirement as specified here.

loveppdog commented 4 years ago

I use driver like the follows and it seems to match. NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2

tanmayv25 commented 4 years ago

I have verified the tf-trt optimization following steps described here.

Inferences/Second vs. Client p95 Batch Latency
Concurrency: 1, throughput: 96.4 infer/sec, latency 11403 usec
Concurrency: 2, throughput: 136.4 infer/sec, latency 16279 usec
Concurrency: 3, throughput: 136.8 infer/sec, latency 24083 usec
Concurrency: 4, throughput: 134.4 infer/sec, latency 32003 usec

Could not initialize cudnn, please check cudnn installation.

The error message that was emitted by the trt_logger means that trt failed to access the cudnn library.. Can you try with a clean install of driver and toolkit ? Make sure that all the previous versions are uninstalled. You can also try with a later release like 20.06 or 20.07.

@Zees1023 Your case is little different.. The error in your case means the trt ran out of memory while trying to build an optimized trt plan. The link here mentions the default value of max_workspace_size_bytes is 1GB. You can try specifying a larger value for this option and see if that works.

tanmayv25 commented 4 years ago

Closing the bug as was unable to reproduce. Please try out the suggestions and re-open if the issue persists.

triton-inference-server / server

fp16 issue in 20.03 #1794