Closed ZJU-lishuang closed 1 year ago
triton server log
==================================
== Triton Inference Server Base ==
==================================
NVIDIA Release 22.03 (build 33743047)
Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
I1003 14:02:03.388966 1 onnxruntime.cc:2319] TRITONBACKEND_Initialize: onnxruntime
I1003 14:02:03.389124 1 onnxruntime.cc:2329] Triton TRITONBACKEND API version: 1.8
I1003 14:02:03.389140 1 onnxruntime.cc:2335] 'onnxruntime' TRITONBACKEND API version: 1.8
I1003 14:02:03.389151 1 onnxruntime.cc:2365] backend configuration:
{}
I1003 14:02:03.537700 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f04ee000000' with size 268435456
I1003 14:02:03.538462 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1003 14:02:03.541414 1 model_repository_manager.cc:997] loading: ERNIE:1
I1003 14:02:03.642048 1 model_repository_manager.cc:997] loading: ResNet50-v1.5:1
I1003 14:02:03.781986 1 paddle.cc:1204] TRITONBACKEND_Initialize: paddle
I1003 14:02:03.782021 1 paddle.cc:1212] Triton TRITONBACKEND API version: 1.8
I1003 14:02:03.782028 1 paddle.cc:1219] 'paddle' TRITONBACKEND API version: 1.8
I1003 14:02:03.782032 1 paddle.cc:1249] backend configuration:
{}
I1003 14:02:03.782059 1 paddle.cc:1266] TRITONBACKEND_ModelInitialize: ERNIE (version 1)
I1003 14:02:03.783862 1 paddle.cc:1266] TRITONBACKEND_ModelInitialize: ResNet50-v1.5 (version 1)
I1003 14:02:03.784426 1 paddle.cc:1309] TRITONBACKEND_ModelInstanceInitialize: ERNIE_0 (GPU device 0)
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1003 14:02:03.819965 88 analysis_config.cc:1164] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape.
I1003 14:02:03.835196 88 analysis_predictor.cc:1220] ir_optim is turned off, no IR pass will be executed
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
I1003 14:02:03.975621 88 analysis_predictor.cc:1274] ======= optimize end =======
I1003 14:02:03.978397 88 naive_executor.cc:110] --- skip [feed], feed -> token_type_ids
I1003 14:02:03.978420 88 naive_executor.cc:110] --- skip [feed], feed -> input_ids
I1003 14:02:03.980311 88 naive_executor.cc:110] --- skip [linear_113.tmp_1], fetch -> fetch
I1003 14:02:12.677105 88 analysis_predictor.cc:1080] TensorRT subgraph engine is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [identity_scale_op_clean_pass]
--- Running IR pass [adaptive_pool2d_convert_global_pass]
--- Running IR pass [shuffle_channel_detect_pass]
--- Running IR pass [quant_conv2d_dequant_fuse_pass]
--- Running IR pass [delete_fill_constant_op_pass]
--- Running IR pass [delete_quant_dequant_op_pass]
--- Running IR pass [delete_quant_dequant_filter_op_pass]
--- Running IR pass [delete_weight_dequant_linear_op_pass]
--- Running IR pass [delete_quant_dequant_linear_op_pass]
--- Running IR pass [add_support_int8_pass]
I1003 14:02:12.796942 88 fuse_pass_base.cc:59] --- detected 220 subgraphs
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [trt_embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [delete_c_identity_op_pass]
--- Running IR pass [trt_multihead_matmul_fuse_pass_v2]
--- Running IR pass [trt_multihead_matmul_fuse_pass_v3]
I1003 14:02:12.937906 88 fuse_pass_base.cc:59] --- detected 6 subgraphs
--- Running IR pass [vit_attention_fuse_pass]
--- Running IR pass [trt_skip_layernorm_fuse_pass]
I1003 14:02:12.947113 88 fuse_pass_base.cc:59] --- detected 13 subgraphs
--- Running IR pass [preln_skip_layernorm_fuse_pass]
--- Running IR pass [preln_residual_bias_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [unsqueeze2_eltwise_fuse_pass]
--- Running IR pass [trt_squeeze2_matmul_fuse_pass]
--- Running IR pass [trt_reshape2_matmul_fuse_pass]
--- Running IR pass [trt_flatten2_matmul_fuse_pass]
--- Running IR pass [trt_map_matmul_v2_to_mul_pass]
I1003 14:02:12.951237 88 fuse_pass_base.cc:59] --- detected 20 subgraphs
--- Running IR pass [trt_map_matmul_v2_to_matmul_pass]
--- Running IR pass [trt_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I1003 14:02:12.956024 88 fuse_pass_base.cc:59] --- detected 20 subgraphs
--- Running IR pass [conv_elementwise_add_fuse_pass]
--- Running IR pass [remove_padding_recover_padding_pass]
--- Running IR pass [delete_remove_padding_recover_padding_pass]
--- Running IR pass [dense_fc_to_sparse_pass]
--- Running IR pass [dense_multihead_matmul_to_sparse_pass]
--- Running IR pass [tensorrt_subgraph_pass]
I1003 14:02:12.962479 88 tensorrt_subgraph_pass.cc:238] --- detect a sub-graph with 51 nodes
I1003 14:02:12.976985 88 tensorrt_subgraph_pass.cc:541] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I1003 14:02:13.822504 88 engine.cc:268] Run Paddle-TRT Dynamic Shape mode.
I1003 14:03:10.097728 88 engine.cc:680] ====== engine info ======
I1003 14:03:10.104418 88 engine.cc:685] Layers:
Scale: before_reshape (Output: tmp_312)
PWN(elementwise (Output: tmp_532), elementwise (Output: tmp_634))
Scale: scale (Output: tmp_312)
skip_layernorm (Output: layer_norm_26.tmp_249)
shuffle_before_multihead_mamul(Output: reshape2_3.tmp_0104)
scale (Output: tmp_312) + unsqueeze2 (Output: unsqueeze2_0.tmp_014)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_3.tmp_0104)
multihead_mamul_fc(Output: reshape2_3.tmp_0104)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_3.tmp_0104)
multihead_matmul (Output: reshape2_3.tmp_0104)
fc_op_reshape_before_fc: Shuffle (Output: linear_79.tmp_1111)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_79.tmp_1111)
fc_op_float: FullyConnected (Output: linear_79.tmp_1111)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_79.tmp_1111)
shuffle_after_fc (Output: linear_79.tmp_1111)
skip_layernorm (Output: layer_norm_27.tmp_2122)
fc_op_reshape_before_fc: Shuffle (Output: linear_80.tmp_1128)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_80.tmp_1128)
fc_op_float: FullyConnected (Output: linear_80.tmp_1128)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_80.tmp_1128)
shuffle_after_fc (Output: linear_80.tmp_1128)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 71) [Constant], (Unnamed Layer* 72) [ElementWise]), (Unnamed Layer* 73) [Unary]), PWN((Unnamed Layer* 69) [Constant], (Unnamed Layer* 74) [ElementWise])), PWN((Unnamed Layer* 70) [Constant], (Unnamed Layer* 75) [ElementWise])), gelu (Output: gelu_1.tmp_0130))
fc_op_reshape_before_fc: Shuffle (Output: linear_81.tmp_1136)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_81.tmp_1136)
fc_op_float: FullyConnected (Output: linear_81.tmp_1136)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_81.tmp_1136)
shuffle_after_fc (Output: linear_81.tmp_1136)
skip_layernorm (Output: layer_norm_28.tmp_2147)
shuffle_before_multihead_mamul(Output: reshape2_7.tmp_0199)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_7.tmp_0199)
multihead_mamul_fc(Output: reshape2_7.tmp_0199)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_7.tmp_0199)
multihead_matmul (Output: reshape2_7.tmp_0199)
fc_op_reshape_before_fc: Shuffle (Output: linear_85.tmp_1206)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_85.tmp_1206)
fc_op_float: FullyConnected (Output: linear_85.tmp_1206)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_85.tmp_1206)
shuffle_after_fc (Output: linear_85.tmp_1206)
skip_layernorm (Output: layer_norm_29.tmp_2217)
fc_op_reshape_before_fc: Shuffle (Output: linear_86.tmp_1223)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_86.tmp_1223)
fc_op_float: FullyConnected (Output: linear_86.tmp_1223)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_86.tmp_1223)
shuffle_after_fc (Output: linear_86.tmp_1223)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 157) [Constant], (Unnamed Layer* 158) [ElementWise]), (Unnamed Layer* 159) [Unary]), PWN((Unnamed Layer* 155) [Constant], (Unnamed Layer* 160) [ElementWise])), PWN((Unnamed Layer* 156) [Constant], (Unnamed Layer* 161) [ElementWise])), gelu (Output: gelu_2.tmp_0225))
fc_op_reshape_before_fc: Shuffle (Output: linear_87.tmp_1231)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_87.tmp_1231)
fc_op_float: FullyConnected (Output: linear_87.tmp_1231)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_87.tmp_1231)
shuffle_after_fc (Output: linear_87.tmp_1231)
skip_layernorm (Output: layer_norm_30.tmp_2242)
shuffle_before_multihead_mamul(Output: reshape2_11.tmp_0294)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_11.tmp_0294)
multihead_mamul_fc(Output: reshape2_11.tmp_0294)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_11.tmp_0294)
multihead_matmul (Output: reshape2_11.tmp_0294)
fc_op_reshape_before_fc: Shuffle (Output: linear_91.tmp_1301)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_91.tmp_1301)
fc_op_float: FullyConnected (Output: linear_91.tmp_1301)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_91.tmp_1301)
shuffle_after_fc (Output: linear_91.tmp_1301)
skip_layernorm (Output: layer_norm_31.tmp_2312)
fc_op_reshape_before_fc: Shuffle (Output: linear_92.tmp_1318)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_92.tmp_1318)
fc_op_float: FullyConnected (Output: linear_92.tmp_1318)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_92.tmp_1318)
shuffle_after_fc (Output: linear_92.tmp_1318)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 243) [Constant], (Unnamed Layer* 244) [ElementWise]), (Unnamed Layer* 245) [Unary]), PWN((Unnamed Layer* 241) [Constant], (Unnamed Layer* 246) [ElementWise])), PWN((Unnamed Layer* 242) [Constant], (Unnamed Layer* 247) [ElementWise])), gelu (Output: gelu_3.tmp_0320))
fc_op_reshape_before_fc: Shuffle (Output: linear_93.tmp_1326)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_93.tmp_1326)
fc_op_float: FullyConnected (Output: linear_93.tmp_1326)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_93.tmp_1326)
shuffle_after_fc (Output: linear_93.tmp_1326)
skip_layernorm (Output: layer_norm_32.tmp_2337)
shuffle_before_multihead_mamul(Output: reshape2_15.tmp_0389)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_15.tmp_0389)
multihead_mamul_fc(Output: reshape2_15.tmp_0389)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_15.tmp_0389)
multihead_matmul (Output: reshape2_15.tmp_0389)
fc_op_reshape_before_fc: Shuffle (Output: linear_97.tmp_1396)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_97.tmp_1396)
fc_op_float: FullyConnected (Output: linear_97.tmp_1396)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_97.tmp_1396)
shuffle_after_fc (Output: linear_97.tmp_1396)
skip_layernorm (Output: layer_norm_33.tmp_2407)
fc_op_reshape_before_fc: Shuffle (Output: linear_98.tmp_1413)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_98.tmp_1413)
fc_op_float: FullyConnected (Output: linear_98.tmp_1413)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_98.tmp_1413)
shuffle_after_fc (Output: linear_98.tmp_1413)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 329) [Constant], (Unnamed Layer* 330) [ElementWise]), (Unnamed Layer* 331) [Unary]), PWN((Unnamed Layer* 327) [Constant], (Unnamed Layer* 332) [ElementWise])), PWN((Unnamed Layer* 328) [Constant], (Unnamed Layer* 333) [ElementWise])), gelu (Output: gelu_4.tmp_0415))
fc_op_reshape_before_fc: Shuffle (Output: linear_99.tmp_1421)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_99.tmp_1421)
fc_op_float: FullyConnected (Output: linear_99.tmp_1421)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_99.tmp_1421)
shuffle_after_fc (Output: linear_99.tmp_1421)
skip_layernorm (Output: layer_norm_34.tmp_2432)
shuffle_before_multihead_mamul(Output: reshape2_19.tmp_0484)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_19.tmp_0484)
multihead_mamul_fc(Output: reshape2_19.tmp_0484)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_19.tmp_0484)
multihead_matmul (Output: reshape2_19.tmp_0484)
fc_op_reshape_before_fc: Shuffle (Output: linear_103.tmp_1491)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_103.tmp_1491)
fc_op_float: FullyConnected (Output: linear_103.tmp_1491)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_103.tmp_1491)
shuffle_after_fc (Output: linear_103.tmp_1491)
skip_layernorm (Output: layer_norm_35.tmp_2502)
fc_op_reshape_before_fc: Shuffle (Output: linear_104.tmp_1508)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_104.tmp_1508)
fc_op_float: FullyConnected (Output: linear_104.tmp_1508)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_104.tmp_1508)
shuffle_after_fc (Output: linear_104.tmp_1508)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 415) [Constant], (Unnamed Layer* 416) [ElementWise]), (Unnamed Layer* 417) [Unary]), PWN((Unnamed Layer* 413) [Constant], (Unnamed Layer* 418) [ElementWise])), PWN((Unnamed Layer* 414) [Constant], (Unnamed Layer* 419) [ElementWise])), gelu (Output: gelu_5.tmp_0510))
fc_op_reshape_before_fc: Shuffle (Output: linear_105.tmp_1516)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_105.tmp_1516)
fc_op_float: FullyConnected (Output: linear_105.tmp_1516)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_105.tmp_1516)
shuffle_after_fc (Output: linear_105.tmp_1516)
skip_layernorm (Output: layer_norm_36.tmp_2527)
shuffle_before_multihead_mamul(Output: reshape2_23.tmp_0579)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_23.tmp_0579)
multihead_mamul_fc(Output: reshape2_23.tmp_0579)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_23.tmp_0579)
multihead_matmul (Output: reshape2_23.tmp_0579)
fc_op_reshape_before_fc: Shuffle (Output: linear_109.tmp_1586)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_109.tmp_1586)
fc_op_float: FullyConnected (Output: linear_109.tmp_1586)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_109.tmp_1586)
shuffle_after_fc (Output: linear_109.tmp_1586)
skip_layernorm (Output: layer_norm_37.tmp_2597)
fc_op_reshape_before_fc: Shuffle (Output: linear_110.tmp_1603)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_110.tmp_1603)
fc_op_float: FullyConnected (Output: linear_110.tmp_1603)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_110.tmp_1603)
shuffle_after_fc (Output: linear_110.tmp_1603)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 501) [Constant], (Unnamed Layer* 502) [ElementWise]), (Unnamed Layer* 503) [Unary]), PWN((Unnamed Layer* 499) [Constant], (Unnamed Layer* 504) [ElementWise])), PWN((Unnamed Layer* 500) [Constant], (Unnamed Layer* 505) [ElementWise])), gelu (Output: gelu_6.tmp_0605))
fc_op_reshape_before_fc: Shuffle (Output: linear_111.tmp_1611)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_111.tmp_1611)
fc_op_float: FullyConnected (Output: linear_111.tmp_1611)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_111.tmp_1611)
shuffle_after_fc (Output: linear_111.tmp_1611)
skip_layernorm (Output: layer_norm_38.tmp_2622)
slice (Output: layer_norm_38.tmp_2_slice_0624) + fc_op_reshape_before_fc: Shuffle (Output: linear_112.tmp_1630)
fc_op_float: FullyConnected (Output: linear_112.tmp_1630)
PWN(tanh (Output: tanh_3.tmp_0632))
fc_op_float: FullyConnected (Output: linear_113.tmp_1641)
shuffle_after_fc (Output: linear_113.tmp_1641)
Bindings:
embedding_10.tmp_0
embedding_11.tmp_0
embedding_8.tmp_0
embedding_9.tmp_0
tmp_2
linear_113.tmp_1641
I1003 14:03:10.104538 88 engine.cc:687] ====== engine info end ======
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1003 14:03:10.112846 88 ir_params_sync_among_devices_pass.cc:88] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I1003 14:03:10.162847 88 memory_optimize_pass.cc:218] Cluster name : full_like_0.tmp_0 size: 8
I1003 14:03:10.162859 88 memory_optimize_pass.cc:218] Cluster name : tmp_4 size: 8
I1003 14:03:10.162863 88 memory_optimize_pass.cc:218] Cluster name : cumsum_0.tmp_0 size: 8
I1003 14:03:10.162864 88 memory_optimize_pass.cc:218] Cluster name : token_type_ids size: 8
--- Running analysis [ir_graph_to_program_pass]
I1003 14:03:10.183755 88 analysis_predictor.cc:1274] ======= optimize end =======
I1003 14:03:10.188370 88 naive_executor.cc:110] --- skip [feed], feed -> token_type_ids
I1003 14:03:10.188387 88 naive_executor.cc:110] --- skip [feed], feed -> input_ids
I1003 14:03:10.188688 88 naive_executor.cc:110] --- skip [linear_113.tmp_1], fetch -> fetch
W1003 14:03:10.188714 88 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.6
W1003 14:03:10.188877 88 gpu_resources.cc:91] device: 0, cuDNN Version: 8.3.
I1003 14:03:10.188996 1 paddle.cc:1309] TRITONBACKEND_ModelInstanceInitialize: ResNet50-v1.5_0 (GPU device 0)
I1003 14:03:10.196445 1 model_repository_manager.cc:1152] successfully loaded 'ERNIE' version 1
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [trt_skip_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I1003 14:03:10.288832 89 fuse_pass_base.cc:59] --- detected 1 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
I1003 14:03:10.289359 89 fuse_pass_base.cc:59] --- detected 1 subgraphs
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
I1003 14:03:10.335124 89 fuse_pass_base.cc:59] --- detected 16 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1003 14:03:10.338466 89 ir_params_sync_among_devices_pass.cc:88] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I1003 14:03:10.395090 89 memory_optimize_pass.cc:218] Cluster name : fill_constant_1.tmp_0 size: 8
I1003 14:03:10.395102 89 memory_optimize_pass.cc:218] Cluster name : x0 size: 602112
I1003 14:03:10.395103 89 memory_optimize_pass.cc:218] Cluster name : elementwise_add_4 size: 1605632
I1003 14:03:10.395107 89 memory_optimize_pass.cc:218] Cluster name : conv2d_63.tmp_1 size: 3211264
I1003 14:03:10.395110 89 memory_optimize_pass.cc:218] Cluster name : elementwise_add_2 size: 3211264
I1003 14:03:10.395112 89 memory_optimize_pass.cc:218] Cluster name : conv2d_60.tmp_1 size: 3211264
--- Running analysis [ir_graph_to_program_pass]
I1003 14:03:10.422075 89 analysis_predictor.cc:1274] ======= optimize end =======
I1003 14:03:10.422649 89 naive_executor.cc:110] --- skip [feed], feed -> x0
I1003 14:03:10.424878 89 naive_executor.cc:110] --- skip [save_infer_model/scale_0.tmp_1], fetch -> fetch
I1003 14:03:10.425115 1 model_repository_manager.cc:1152] successfully loaded 'ResNet50-v1.5' version 1
I1003 14:03:10.425218 1 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I1003 14:03:10.425272 1 server.cc:551]
+-------------+-----------------------------------------------------------------+--------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+--------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| paddle | /opt/tritonserver/backends/paddle/libtriton_paddle.so | {} |
+-------------+-----------------------------------------------------------------+--------+
I1003 14:03:10.425306 1 server.cc:594]
+---------------+---------+--------+
| Model | Version | Status |
+---------------+---------+--------+
| ERNIE | 1 | READY |
| ResNet50-v1.5 | 1 | READY |
+---------------+---------+--------+
I1003 14:03:10.469245 1 metrics.cc:651] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
I1003 14:03:10.469525 1 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /workspace/models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I1003 14:03:10.474583 1 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001
I1003 14:03:10.475297 1 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000
I1003 14:03:10.516651 1 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W1003 14:03:11.474144 1 metrics.cc:427] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W1003 14:03:12.475025 1 metrics.cc:427] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W1003 14:03:13.477496 1 metrics.cc:427] Unable to get power limit for GPU 0. Status:Success, value:0.000000
I find the problem is the tensorrt optimization according to profileIndex >= 0 && profileIndex < mEngine.getNbOptimizationProfiles()
.
How to solve it?
Are you using paddlepaddle/triton_paddle:21.10
image or other images?
@ZJU-lishuang
I use paddlepaddle/triton_paddle:21.10
image to work correctly.
other images。build from source,triton22.03
How is the paddle inference library for triton_paddle dependent obtained, source compiled or downloaded from?
Too higher versions of cuda and TensorRT may be risky in running paddle inference, I suggest you use our verified image first.
我认为22.03的cuda和tensorrt版本应该没问题
You can compile Paddle with release/2.4 : https://github.com/PaddlePaddle/Paddle/tree/release/2.4.
There may be a problem with the code branch you provide
I will try https://github.com/triton-inference-server/server/tree/r22.03 and https://github.com/PaddlePaddle/Paddle/tree/release/2.4 again.And report the problem. I have try this combination several days ago.
I used 21.10 + paddle release/2.4 when compiling the paddlepaddle/triton_paddle:21.10
image. So I suspect the TRT version may not match
ERROR:
Scanning dependencies of target paddle_inference_c
Scanning dependencies of target paddle_inference_c_shared
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_config.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_utils.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
[100%] Linking CXX shared library libpaddle_inference.so
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_utils.cc.o
[100%] Linking CXX static library libpaddle_inference_c.a
[100%] Built target paddle_inference_c
[100%] Linking CXX shared library libpaddle_inference_c.so
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.42]':
io.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
io.cc:(.text+0x13ba): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.592]':
io.cc:(.text+0x146a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::inference::ReadBinaryFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)':
io.cc:(.text+0x178e): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x179f): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18aa): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<char [14]>(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18b7): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<char [14]>(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18e8): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/build.make:2244: paddle/fluid/inference/libpaddle_inference.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:163108: paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.67]':
pd_config.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
pd_config.cc:(.text+0x22a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.300]':
pd_config.cc:(.text+0x2da): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `PD_ConfigDestroy':
pd_config.cc:(.text+0x808e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/libpthread.so
pd_config.cc:(.text+0x80b7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_ptr<decltype(nullptr), (__gnu_cxx::_Lock_policy)2>::_M_dispose()' defined in .text._ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv[_ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
pd_config.cc:(.text+0x80e9): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy()' defined in .text._ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv[_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::what() const':
pd_config.cc:(.text._ZNK3phi7enforce13EnforceNotMet4whatEv[_ZNK3phi7enforce13EnforceNotMet4whatEv]+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `fLI::FLAGS_call_stack_level' defined in .data section in ../libpaddle_inference.a(flags.cc.o)
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::~EnforceNotMet()':
pd_config.cc:(.text._ZN3phi7enforce13EnforceNotMetD2Ev[_ZN3phi7enforce13EnforceNotMetD5Ev]+0x13): additional relocation overflows omitted from the output
libpaddle_inference_c.so: PC-relative offset overflow in PLT entry for `_ZN3phi5funcs21LaunchBroadcastKernelINS_5dtype7float16ES3_NS_3kps13DivideFunctorIS3_fEELi1ELi1ELi4EEEvRKNS_10GPUContextERKSt6vectorIPKNS_11DenseTensorESaISD_EEPSA_IPSB_SaISI_EET1_RKNS_5ArrayINS4_7details15BroadcastConfigEXT2_EEE'
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/build.make:1204: paddle/fluid/inference/capi_exp/libpaddle_inference_c.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:177249: paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
The command '/bin/sh -c python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env && cd build-env && cmake .. -DWITH_PYTHON=OFF -DWITH_GPU=ON -DWITH_TESTING=OFF -DWITH_INFERENCE_API_TEST=OFF -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Auto -DON_INFER=ON -DWITH_MKL=ON -DWITH_TENSORRT=ON -DWITH_ONNXRUNTIME=ON && make -j8' returned a non-zero code: 2
paddlepaddle_backend/paddle-lib/Dockerfile
FROM nvcr.io/nvidia/tritonserver:22.03-py3
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-key del 7fa2af80 \
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \
&& dpkg -i cuda-keyring_1.0-1_all.deb
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
cmake \
patchelf \
python3-dev \
unzip \
gcc-8 \
g++-8 \
libgl1 \
libssl-dev
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100
RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100
RUN git clone 'https://github.com/PaddlePaddle/Paddle.git'
WORKDIR /opt/tritonserver/Paddle
RUN git pull && git checkout release/2.4
RUN python3 -m pip install pyyaml && mkdir build-env && \
cd build-env && \
cmake .. -DWITH_PYTHON=OFF \
-DWITH_GPU=ON \
-DWITH_TESTING=OFF \
-DWITH_INFERENCE_API_TEST=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCUDA_ARCH_NAME=Auto \
-DON_INFER=ON \
-DWITH_MKL=ON \
-DWITH_TENSORRT=ON \
-DWITH_ONNXRUNTIME=ON && \
make -j`nproc`
PaddlePaddle compilation errors require raising issue on the paddlepaddle official website: https://github.com/PaddlePaddle/Paddle/issues
@ZJU-lishuang Since release/2.4 hasn't been officially released yet, would you try v2.4.0-rc0?
the same problem,I have tried 2.4.0-rc0.
another dockerfile record ERROR:
[ 86%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_helpers.cc.o
[ 6%] Building CUDA object paddle/phi/kernels/funcs/eigen/CMakeFiles/eigen_function.dir/broadcast.cu.o
[ 86%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_map_field.cc.o
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9146): error: identifier "__builtin_ia32_rndscaless_round" is undefined
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9155): error: identifier "__builtin_ia32_rndscalesd_round" is undefined
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14797): error: identifier "__builtin_ia32_rndscaless_round" is undefined
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14806): error: identifier "__builtin_ia32_rndscalesd_round" is undefined
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1365): error: identifier "__builtin_ia32_fpclassss" is undefined
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1372): error: identifier "__builtin_ia32_fpclasssd" is undefined
[ 87%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_message.cc.o
[ 87%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_message_field.cc.o
[ 88%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_primitive_field.cc.o
[ 88%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/js/js_generator.cc.o
...
...
...
[ 94%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/objectivec/objectivec_oneof.cc.o
[ 94%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.cc.o
6 errors detected in the compilation of "/opt/tritonserver/Paddle/paddle/phi/kernels/funcs/eigen/broadcast.cu".
make[2]: *** [paddle/phi/kernels/funcs/eigen/CMakeFiles/eigen_function.dir/build.make:206: paddle/phi/kernels/funcs/eigen/CMakeFiles/eigen_function.dir/broadcast.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 95%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/php/php_generator.cc.o
[ 95%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/plugin.cc.o
paddlepaddle_backend/paddle-lib/Dockerfile
FROM nvcr.io/nvidia/tritonserver:22.03-py3
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-key del 7fa2af80 \
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \
&& dpkg -i cuda-keyring_1.0-1_all.deb
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
cmake \
patchelf \
python3-dev \
unzip \
gcc-8 \
g++-8 \
libgl1 \
libssl-dev
RUN git clone 'https://github.com/PaddlePaddle/Paddle.git'
WORKDIR /opt/tritonserver/Paddle
RUN git pull && git checkout release/2.4
RUN python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env && \
cd build-env && \
cmake .. -DWITH_PYTHON=OFF \
-DWITH_GPU=ON \
-DWITH_TESTING=OFF \
-DWITH_INFERENCE_API_TEST=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCUDA_ARCH_NAME=Auto \
-DON_INFER=ON \
-DWITH_MKL=ON \
-DWITH_TENSORRT=ON \
-DWITH_ONNXRUNTIME=ON \
-DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` && \
make -j8
ERROR:
[100%] Built target paddle_inference
Scanning dependencies of target paddle_inference_c
Scanning dependencies of target paddle_inference_c_shared
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_config.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_utils.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_utils.cc.o
[100%] Linking CXX static library libpaddle_inference_c.a
[100%] Built target paddle_inference_c
[100%] Linking CXX shared library libpaddle_inference_c.so
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.42]':
io.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
io.cc:(.text+0x13ba): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.592]':
io.cc:(.text+0x146a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::inference::ReadBinaryFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)':
io.cc:(.text+0x178e): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x179f): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18aa): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<char [14]>(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18b7): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<char [14]>(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18e8): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/build.make:2244: paddle/fluid/inference/libpaddle_inference.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:163108: paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.67]':
pd_config.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
pd_config.cc:(.text+0x22a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.300]':
pd_config.cc:(.text+0x2da): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `PD_ConfigDestroy':
pd_config.cc:(.text+0x808e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/libpthread.so
pd_config.cc:(.text+0x80b7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_ptr<decltype(nullptr), (__gnu_cxx::_Lock_policy)2>::_M_dispose()' defined in .text._ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv[_ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
pd_config.cc:(.text+0x80e9): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy()' defined in .text._ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv[_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::what() const':
pd_config.cc:(.text._ZNK3phi7enforce13EnforceNotMet4whatEv[_ZNK3phi7enforce13EnforceNotMet4whatEv]+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `fLI::FLAGS_call_stack_level' defined in .data section in ../libpaddle_inference.a(flags.cc.o)
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::~EnforceNotMet()':
pd_config.cc:(.text._ZN3phi7enforce13EnforceNotMetD2Ev[_ZN3phi7enforce13EnforceNotMetD5Ev]+0x13): additional relocation overflows omitted from the output
libpaddle_inference_c.so: PC-relative offset overflow in PLT entry for `_ZN3phi5funcs21LaunchBroadcastKernelINS_5dtype7float16ES3_NS_3kps13DivideFunctorIS3_fEELi1ELi1ELi4EEEvRKNS_10GPUContextERKSt6vectorIPKNS_11DenseTensorESaISD_EEPSA_IPSB_SaISI_EET1_RKNS_5ArrayINS4_7details15BroadcastConfigEXT2_EEE'
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/build.make:1204: paddle/fluid/inference/capi_exp/libpaddle_inference_c.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:177249: paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
The command '/bin/sh -c python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env && cd build-env && cmake .. -DWITH_PYTHON=OFF -DWITH_GPU=ON -DWITH_TESTING=OFF -DWITH_INFERENCE_API_TEST=OFF -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Auto -DON_INFER=ON -DWITH_MKL=ON -DWITH_TENSORRT=ON -DWITH_ONNXRUNTIME=ON -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` && make -j8' returned a non-zero code: 2
paddlepaddle_backend/paddle-lib/Dockerfile
FROM nvcr.io/nvidia/tritonserver:22.03-py3
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-key del 7fa2af80 \
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \
&& dpkg -i cuda-keyring_1.0-1_all.deb
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
cmake \
patchelf \
python3-dev \
unzip \
gcc-8 \
g++-8 \
libgl1 \
libssl-dev
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100
RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100
RUN git clone 'https://github.com/PaddlePaddle/Paddle.git'
WORKDIR /opt/tritonserver/Paddle
RUN git pull && git checkout release/2.4
RUN python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env && \
cd build-env && \
cmake .. -DWITH_PYTHON=OFF \
-DWITH_GPU=ON \
-DWITH_TESTING=OFF \
-DWITH_INFERENCE_API_TEST=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCUDA_ARCH_NAME=Auto \
-DON_INFER=ON \
-DWITH_MKL=ON \
-DWITH_TENSORRT=ON \
-DWITH_ONNXRUNTIME=ON \
-DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` && \
make -j8
Please put forward issues on https://github.com/PaddlePaddle/Paddle/issues
about PaddlePaddle compilation.
When I run bash perf_ernie.sh,the server output following:
But I run perf_resnet50_v1.5.sh SUCCESS.
Is it possible to fix this issues ?