triton-inference-server / paddlepaddle_backend

BSD 3-Clause "New" or "Revised" License
32 stars 6 forks source link

ERROR in bash perf_ernie.sh,SUCCESS in bash perf_resnet50_v1.5.sh #12

Closed ZJU-lishuang closed 1 year ago

ZJU-lishuang commented 1 year ago

When I run bash perf_ernie.sh,the server output following:

E1003 14:03:23.654152    91 helper.h:114] 3: [executionContext.cpp::setOptimizationProfileInternal::755] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setOptimizationProfileInternal::755, condition: profileIndex >= 0 && profileIndex < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654186    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654201    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654212    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654222    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654234    91 helper.h:114] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654249    91 helper.h:114] 3: [executionContext.cpp::getBindingDimensions::978] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::getBindingDimensions::978, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
E1003 14:03:23.654286    91 helper.h:114] 3: [executionContext.cpp::enqueueInternal::318] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::318, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
Signal (11) received.
 0# 0x0000562C43FD3549 in /opt/tritonserver/bin/tritonserver
 1# 0x00007F052C55D0C0 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007F05204E3928 in /opt/tritonserver/backends/paddle/libtriton_paddle.so
 3# 0x00007F0520495579 in /opt/tritonserver/backends/paddle/libtriton_paddle.so
 4# 0x00007F0520496775 in /opt/tritonserver/backends/paddle/libtriton_paddle.so
 5# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/paddle/libtriton_paddle.so
 6# 0x00007F052D10A07A in /opt/tritonserver/bin/../lib/libtritonserver.so
 7# 0x00007F052D10A797 in /opt/tritonserver/bin/../lib/libtritonserver.so
 8# 0x00007F052CF9D221 in /opt/tritonserver/bin/../lib/libtritonserver.so
 9# 0x00007F052D104607 in /opt/tritonserver/bin/../lib/libtritonserver.so
10# 0x00007F052C94EDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
11# 0x00007F052CDCB609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
12# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

But I run perf_resnet50_v1.5.sh SUCCESS.

Is it possible to fix this issues ?

ZJU-lishuang commented 1 year ago

triton server log

==================================
== Triton Inference Server Base ==
==================================

NVIDIA Release 22.03 (build 33743047)

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I1003 14:02:03.388966 1 onnxruntime.cc:2319] TRITONBACKEND_Initialize: onnxruntime
I1003 14:02:03.389124 1 onnxruntime.cc:2329] Triton TRITONBACKEND API version: 1.8
I1003 14:02:03.389140 1 onnxruntime.cc:2335] 'onnxruntime' TRITONBACKEND API version: 1.8
I1003 14:02:03.389151 1 onnxruntime.cc:2365] backend configuration:
{}
I1003 14:02:03.537700 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f04ee000000' with size 268435456
I1003 14:02:03.538462 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1003 14:02:03.541414 1 model_repository_manager.cc:997] loading: ERNIE:1
I1003 14:02:03.642048 1 model_repository_manager.cc:997] loading: ResNet50-v1.5:1
I1003 14:02:03.781986 1 paddle.cc:1204] TRITONBACKEND_Initialize: paddle
I1003 14:02:03.782021 1 paddle.cc:1212] Triton TRITONBACKEND API version: 1.8
I1003 14:02:03.782028 1 paddle.cc:1219] 'paddle' TRITONBACKEND API version: 1.8
I1003 14:02:03.782032 1 paddle.cc:1249] backend configuration:
{}
I1003 14:02:03.782059 1 paddle.cc:1266] TRITONBACKEND_ModelInitialize: ERNIE (version 1)
I1003 14:02:03.783862 1 paddle.cc:1266] TRITONBACKEND_ModelInitialize: ResNet50-v1.5 (version 1)
I1003 14:02:03.784426 1 paddle.cc:1309] TRITONBACKEND_ModelInstanceInitialize: ERNIE_0 (GPU device 0)
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1003 14:02:03.819965    88 analysis_config.cc:1164] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape.
I1003 14:02:03.835196    88 analysis_predictor.cc:1220] ir_optim is turned off, no IR pass will be executed
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
I1003 14:02:03.975621    88 analysis_predictor.cc:1274] ======= optimize end =======
I1003 14:02:03.978397    88 naive_executor.cc:110] ---  skip [feed], feed -> token_type_ids
I1003 14:02:03.978420    88 naive_executor.cc:110] ---  skip [feed], feed -> input_ids
I1003 14:02:03.980311    88 naive_executor.cc:110] ---  skip [linear_113.tmp_1], fetch -> fetch
I1003 14:02:12.677105    88 analysis_predictor.cc:1080] TensorRT subgraph engine is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [identity_scale_op_clean_pass]
--- Running IR pass [adaptive_pool2d_convert_global_pass]
--- Running IR pass [shuffle_channel_detect_pass]
--- Running IR pass [quant_conv2d_dequant_fuse_pass]
--- Running IR pass [delete_fill_constant_op_pass]
--- Running IR pass [delete_quant_dequant_op_pass]
--- Running IR pass [delete_quant_dequant_filter_op_pass]
--- Running IR pass [delete_weight_dequant_linear_op_pass]
--- Running IR pass [delete_quant_dequant_linear_op_pass]
--- Running IR pass [add_support_int8_pass]
I1003 14:02:12.796942    88 fuse_pass_base.cc:59] ---  detected 220 subgraphs
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [trt_embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [delete_c_identity_op_pass]
--- Running IR pass [trt_multihead_matmul_fuse_pass_v2]
--- Running IR pass [trt_multihead_matmul_fuse_pass_v3]
I1003 14:02:12.937906    88 fuse_pass_base.cc:59] ---  detected 6 subgraphs
--- Running IR pass [vit_attention_fuse_pass]
--- Running IR pass [trt_skip_layernorm_fuse_pass]
I1003 14:02:12.947113    88 fuse_pass_base.cc:59] ---  detected 13 subgraphs
--- Running IR pass [preln_skip_layernorm_fuse_pass]
--- Running IR pass [preln_residual_bias_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [unsqueeze2_eltwise_fuse_pass]
--- Running IR pass [trt_squeeze2_matmul_fuse_pass]
--- Running IR pass [trt_reshape2_matmul_fuse_pass]
--- Running IR pass [trt_flatten2_matmul_fuse_pass]
--- Running IR pass [trt_map_matmul_v2_to_mul_pass]
I1003 14:02:12.951237    88 fuse_pass_base.cc:59] ---  detected 20 subgraphs
--- Running IR pass [trt_map_matmul_v2_to_matmul_pass]
--- Running IR pass [trt_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
I1003 14:02:12.956024    88 fuse_pass_base.cc:59] ---  detected 20 subgraphs
--- Running IR pass [conv_elementwise_add_fuse_pass]
--- Running IR pass [remove_padding_recover_padding_pass]
--- Running IR pass [delete_remove_padding_recover_padding_pass]
--- Running IR pass [dense_fc_to_sparse_pass]
--- Running IR pass [dense_multihead_matmul_to_sparse_pass]
--- Running IR pass [tensorrt_subgraph_pass]
I1003 14:02:12.962479    88 tensorrt_subgraph_pass.cc:238] ---  detect a sub-graph with 51 nodes
I1003 14:02:12.976985    88 tensorrt_subgraph_pass.cc:541] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I1003 14:02:13.822504    88 engine.cc:268] Run Paddle-TRT Dynamic Shape mode.
I1003 14:03:10.097728    88 engine.cc:680] ====== engine info ======
I1003 14:03:10.104418    88 engine.cc:685] Layers:
Scale: before_reshape (Output: tmp_312)
PWN(elementwise (Output: tmp_532), elementwise (Output: tmp_634))
Scale: scale (Output: tmp_312)
skip_layernorm (Output: layer_norm_26.tmp_249)
shuffle_before_multihead_mamul(Output: reshape2_3.tmp_0104)
scale (Output: tmp_312) + unsqueeze2 (Output: unsqueeze2_0.tmp_014)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_3.tmp_0104)
multihead_mamul_fc(Output: reshape2_3.tmp_0104)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_3.tmp_0104)
multihead_matmul (Output: reshape2_3.tmp_0104)
fc_op_reshape_before_fc: Shuffle (Output: linear_79.tmp_1111)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_79.tmp_1111)
fc_op_float: FullyConnected (Output: linear_79.tmp_1111)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_79.tmp_1111)
shuffle_after_fc (Output: linear_79.tmp_1111)
skip_layernorm (Output: layer_norm_27.tmp_2122)
fc_op_reshape_before_fc: Shuffle (Output: linear_80.tmp_1128)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_80.tmp_1128)
fc_op_float: FullyConnected (Output: linear_80.tmp_1128)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_80.tmp_1128)
shuffle_after_fc (Output: linear_80.tmp_1128)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 71) [Constant], (Unnamed Layer* 72) [ElementWise]), (Unnamed Layer* 73) [Unary]), PWN((Unnamed Layer* 69) [Constant], (Unnamed Layer* 74) [ElementWise])), PWN((Unnamed Layer* 70) [Constant], (Unnamed Layer* 75) [ElementWise])), gelu (Output: gelu_1.tmp_0130))
fc_op_reshape_before_fc: Shuffle (Output: linear_81.tmp_1136)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_81.tmp_1136)
fc_op_float: FullyConnected (Output: linear_81.tmp_1136)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_81.tmp_1136)
shuffle_after_fc (Output: linear_81.tmp_1136)
skip_layernorm (Output: layer_norm_28.tmp_2147)
shuffle_before_multihead_mamul(Output: reshape2_7.tmp_0199)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_7.tmp_0199)
multihead_mamul_fc(Output: reshape2_7.tmp_0199)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_7.tmp_0199)
multihead_matmul (Output: reshape2_7.tmp_0199)
fc_op_reshape_before_fc: Shuffle (Output: linear_85.tmp_1206)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_85.tmp_1206)
fc_op_float: FullyConnected (Output: linear_85.tmp_1206)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_85.tmp_1206)
shuffle_after_fc (Output: linear_85.tmp_1206)
skip_layernorm (Output: layer_norm_29.tmp_2217)
fc_op_reshape_before_fc: Shuffle (Output: linear_86.tmp_1223)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_86.tmp_1223)
fc_op_float: FullyConnected (Output: linear_86.tmp_1223)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_86.tmp_1223)
shuffle_after_fc (Output: linear_86.tmp_1223)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 157) [Constant], (Unnamed Layer* 158) [ElementWise]), (Unnamed Layer* 159) [Unary]), PWN((Unnamed Layer* 155) [Constant], (Unnamed Layer* 160) [ElementWise])), PWN((Unnamed Layer* 156) [Constant], (Unnamed Layer* 161) [ElementWise])), gelu (Output: gelu_2.tmp_0225))
fc_op_reshape_before_fc: Shuffle (Output: linear_87.tmp_1231)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_87.tmp_1231)
fc_op_float: FullyConnected (Output: linear_87.tmp_1231)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_87.tmp_1231)
shuffle_after_fc (Output: linear_87.tmp_1231)
skip_layernorm (Output: layer_norm_30.tmp_2242)
shuffle_before_multihead_mamul(Output: reshape2_11.tmp_0294)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_11.tmp_0294)
multihead_mamul_fc(Output: reshape2_11.tmp_0294)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_11.tmp_0294)
multihead_matmul (Output: reshape2_11.tmp_0294)
fc_op_reshape_before_fc: Shuffle (Output: linear_91.tmp_1301)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_91.tmp_1301)
fc_op_float: FullyConnected (Output: linear_91.tmp_1301)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_91.tmp_1301)
shuffle_after_fc (Output: linear_91.tmp_1301)
skip_layernorm (Output: layer_norm_31.tmp_2312)
fc_op_reshape_before_fc: Shuffle (Output: linear_92.tmp_1318)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_92.tmp_1318)
fc_op_float: FullyConnected (Output: linear_92.tmp_1318)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_92.tmp_1318)
shuffle_after_fc (Output: linear_92.tmp_1318)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 243) [Constant], (Unnamed Layer* 244) [ElementWise]), (Unnamed Layer* 245) [Unary]), PWN((Unnamed Layer* 241) [Constant], (Unnamed Layer* 246) [ElementWise])), PWN((Unnamed Layer* 242) [Constant], (Unnamed Layer* 247) [ElementWise])), gelu (Output: gelu_3.tmp_0320))
fc_op_reshape_before_fc: Shuffle (Output: linear_93.tmp_1326)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_93.tmp_1326)
fc_op_float: FullyConnected (Output: linear_93.tmp_1326)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_93.tmp_1326)
shuffle_after_fc (Output: linear_93.tmp_1326)
skip_layernorm (Output: layer_norm_32.tmp_2337)
shuffle_before_multihead_mamul(Output: reshape2_15.tmp_0389)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_15.tmp_0389)
multihead_mamul_fc(Output: reshape2_15.tmp_0389)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_15.tmp_0389)
multihead_matmul (Output: reshape2_15.tmp_0389)
fc_op_reshape_before_fc: Shuffle (Output: linear_97.tmp_1396)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_97.tmp_1396)
fc_op_float: FullyConnected (Output: linear_97.tmp_1396)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_97.tmp_1396)
shuffle_after_fc (Output: linear_97.tmp_1396)
skip_layernorm (Output: layer_norm_33.tmp_2407)
fc_op_reshape_before_fc: Shuffle (Output: linear_98.tmp_1413)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_98.tmp_1413)
fc_op_float: FullyConnected (Output: linear_98.tmp_1413)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_98.tmp_1413)
shuffle_after_fc (Output: linear_98.tmp_1413)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 329) [Constant], (Unnamed Layer* 330) [ElementWise]), (Unnamed Layer* 331) [Unary]), PWN((Unnamed Layer* 327) [Constant], (Unnamed Layer* 332) [ElementWise])), PWN((Unnamed Layer* 328) [Constant], (Unnamed Layer* 333) [ElementWise])), gelu (Output: gelu_4.tmp_0415))
fc_op_reshape_before_fc: Shuffle (Output: linear_99.tmp_1421)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_99.tmp_1421)
fc_op_float: FullyConnected (Output: linear_99.tmp_1421)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_99.tmp_1421)
shuffle_after_fc (Output: linear_99.tmp_1421)
skip_layernorm (Output: layer_norm_34.tmp_2432)
shuffle_before_multihead_mamul(Output: reshape2_19.tmp_0484)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_19.tmp_0484)
multihead_mamul_fc(Output: reshape2_19.tmp_0484)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_19.tmp_0484)
multihead_matmul (Output: reshape2_19.tmp_0484)
fc_op_reshape_before_fc: Shuffle (Output: linear_103.tmp_1491)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_103.tmp_1491)
fc_op_float: FullyConnected (Output: linear_103.tmp_1491)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_103.tmp_1491)
shuffle_after_fc (Output: linear_103.tmp_1491)
skip_layernorm (Output: layer_norm_35.tmp_2502)
fc_op_reshape_before_fc: Shuffle (Output: linear_104.tmp_1508)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_104.tmp_1508)
fc_op_float: FullyConnected (Output: linear_104.tmp_1508)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_104.tmp_1508)
shuffle_after_fc (Output: linear_104.tmp_1508)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 415) [Constant], (Unnamed Layer* 416) [ElementWise]), (Unnamed Layer* 417) [Unary]), PWN((Unnamed Layer* 413) [Constant], (Unnamed Layer* 418) [ElementWise])), PWN((Unnamed Layer* 414) [Constant], (Unnamed Layer* 419) [ElementWise])), gelu (Output: gelu_5.tmp_0510))
fc_op_reshape_before_fc: Shuffle (Output: linear_105.tmp_1516)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_105.tmp_1516)
fc_op_float: FullyConnected (Output: linear_105.tmp_1516)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_105.tmp_1516)
shuffle_after_fc (Output: linear_105.tmp_1516)
skip_layernorm (Output: layer_norm_36.tmp_2527)
shuffle_before_multihead_mamul(Output: reshape2_23.tmp_0579)
Reformatting CopyNode for Input Tensor 0 to multihead_mamul_fc(Output: reshape2_23.tmp_0579)
multihead_mamul_fc(Output: reshape2_23.tmp_0579)
Reformatting CopyNode for Input Tensor 0 to multihead_matmul (Output: reshape2_23.tmp_0579)
multihead_matmul (Output: reshape2_23.tmp_0579)
fc_op_reshape_before_fc: Shuffle (Output: linear_109.tmp_1586)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_109.tmp_1586)
fc_op_float: FullyConnected (Output: linear_109.tmp_1586)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_109.tmp_1586)
shuffle_after_fc (Output: linear_109.tmp_1586)
skip_layernorm (Output: layer_norm_37.tmp_2597)
fc_op_reshape_before_fc: Shuffle (Output: linear_110.tmp_1603)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_110.tmp_1603)
fc_op_float: FullyConnected (Output: linear_110.tmp_1603)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_110.tmp_1603)
shuffle_after_fc (Output: linear_110.tmp_1603)
PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 501) [Constant], (Unnamed Layer* 502) [ElementWise]), (Unnamed Layer* 503) [Unary]), PWN((Unnamed Layer* 499) [Constant], (Unnamed Layer* 504) [ElementWise])), PWN((Unnamed Layer* 500) [Constant], (Unnamed Layer* 505) [ElementWise])), gelu (Output: gelu_6.tmp_0605))
fc_op_reshape_before_fc: Shuffle (Output: linear_111.tmp_1611)
Reformatting CopyNode for Input Tensor 0 to fc_op_float: FullyConnected (Output: linear_111.tmp_1611)
fc_op_float: FullyConnected (Output: linear_111.tmp_1611)
Reformatting CopyNode for Input Tensor 0 to shuffle_after_fc (Output: linear_111.tmp_1611)
shuffle_after_fc (Output: linear_111.tmp_1611)
skip_layernorm (Output: layer_norm_38.tmp_2622)
slice (Output: layer_norm_38.tmp_2_slice_0624) + fc_op_reshape_before_fc: Shuffle (Output: linear_112.tmp_1630)
fc_op_float: FullyConnected (Output: linear_112.tmp_1630)
PWN(tanh (Output: tanh_3.tmp_0632))
fc_op_float: FullyConnected (Output: linear_113.tmp_1641)
shuffle_after_fc (Output: linear_113.tmp_1641)

Bindings:
embedding_10.tmp_0
embedding_11.tmp_0
embedding_8.tmp_0
embedding_9.tmp_0
tmp_2
linear_113.tmp_1641
I1003 14:03:10.104538    88 engine.cc:687] ====== engine info end ======
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1003 14:03:10.112846    88 ir_params_sync_among_devices_pass.cc:88] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I1003 14:03:10.162847    88 memory_optimize_pass.cc:218] Cluster name : full_like_0.tmp_0  size: 8
I1003 14:03:10.162859    88 memory_optimize_pass.cc:218] Cluster name : tmp_4  size: 8
I1003 14:03:10.162863    88 memory_optimize_pass.cc:218] Cluster name : cumsum_0.tmp_0  size: 8
I1003 14:03:10.162864    88 memory_optimize_pass.cc:218] Cluster name : token_type_ids  size: 8
--- Running analysis [ir_graph_to_program_pass]
I1003 14:03:10.183755    88 analysis_predictor.cc:1274] ======= optimize end =======
I1003 14:03:10.188370    88 naive_executor.cc:110] ---  skip [feed], feed -> token_type_ids
I1003 14:03:10.188387    88 naive_executor.cc:110] ---  skip [feed], feed -> input_ids
I1003 14:03:10.188688    88 naive_executor.cc:110] ---  skip [linear_113.tmp_1], fetch -> fetch
W1003 14:03:10.188714    88 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.6
W1003 14:03:10.188877    88 gpu_resources.cc:91] device: 0, cuDNN Version: 8.3.
I1003 14:03:10.188996 1 paddle.cc:1309] TRITONBACKEND_ModelInstanceInitialize: ResNet50-v1.5_0 (GPU device 0)
I1003 14:03:10.196445 1 model_repository_manager.cc:1152] successfully loaded 'ERNIE' version 1
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [trt_skip_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I1003 14:03:10.288832    89 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
I1003 14:03:10.289359    89 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
I1003 14:03:10.335124    89 fuse_pass_base.cc:59] ---  detected 16 subgraphs
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1003 14:03:10.338466    89 ir_params_sync_among_devices_pass.cc:88] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I1003 14:03:10.395090    89 memory_optimize_pass.cc:218] Cluster name : fill_constant_1.tmp_0  size: 8
I1003 14:03:10.395102    89 memory_optimize_pass.cc:218] Cluster name : x0  size: 602112
I1003 14:03:10.395103    89 memory_optimize_pass.cc:218] Cluster name : elementwise_add_4  size: 1605632
I1003 14:03:10.395107    89 memory_optimize_pass.cc:218] Cluster name : conv2d_63.tmp_1  size: 3211264
I1003 14:03:10.395110    89 memory_optimize_pass.cc:218] Cluster name : elementwise_add_2  size: 3211264
I1003 14:03:10.395112    89 memory_optimize_pass.cc:218] Cluster name : conv2d_60.tmp_1  size: 3211264
--- Running analysis [ir_graph_to_program_pass]
I1003 14:03:10.422075    89 analysis_predictor.cc:1274] ======= optimize end =======
I1003 14:03:10.422649    89 naive_executor.cc:110] ---  skip [feed], feed -> x0
I1003 14:03:10.424878    89 naive_executor.cc:110] ---  skip [save_infer_model/scale_0.tmp_1], fetch -> fetch
I1003 14:03:10.425115 1 model_repository_manager.cc:1152] successfully loaded 'ResNet50-v1.5' version 1
I1003 14:03:10.425218 1 server.cc:524] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1003 14:03:10.425272 1 server.cc:551] 
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| paddle      | /opt/tritonserver/backends/paddle/libtriton_paddle.so           | {}     |
+-------------+-----------------------------------------------------------------+--------+

I1003 14:03:10.425306 1 server.cc:594] 
+---------------+---------+--------+
| Model         | Version | Status |
+---------------+---------+--------+
| ERNIE         | 1       | READY  |
| ResNet50-v1.5 | 1       | READY  |
+---------------+---------+--------+

I1003 14:03:10.469245 1 metrics.cc:651] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
I1003 14:03:10.469525 1 tritonserver.cc:1962] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                        |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                       |
| server_version                   | 2.20.0                                                                                                                                                                                       |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0]         | /workspace/models                                                                                                                                                                            |
| model_control_mode               | MODE_NONE                                                                                                                                                                                    |
| strict_model_config              | 1                                                                                                                                                                                            |
| rate_limit                       | OFF                                                                                                                                                                                          |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                     |
| response_cache_byte_size         | 0                                                                                                                                                                                            |
| min_supported_compute_capability | 6.0                                                                                                                                                                                          |
| strict_readiness                 | 1                                                                                                                                                                                            |
| exit_timeout                     | 30                                                                                                                                                                                           |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1003 14:03:10.474583 1 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001
I1003 14:03:10.475297 1 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000
I1003 14:03:10.516651 1 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W1003 14:03:11.474144 1 metrics.cc:427] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W1003 14:03:12.475025 1 metrics.cc:427] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W1003 14:03:13.477496 1 metrics.cc:427] Unable to get power limit for GPU 0. Status:Success, value:0.000000
ZJU-lishuang commented 1 year ago

I find the problem is the tensorrt optimization according to profileIndex >= 0 && profileIndex < mEngine.getNbOptimizationProfiles().

How to solve it?

heliqi commented 1 year ago

Are you using paddlepaddle/triton_paddle:21.10 image or other images?

heliqi commented 1 year ago

@ZJU-lishuang I use paddlepaddle/triton_paddle:21.10 image to work correctly.

ZJU-lishuang commented 1 year ago

other images。build from source,triton22.03

heliqi commented 1 year ago

How is the paddle inference library for triton_paddle dependent obtained, source compiled or downloaded from?

Too higher versions of cuda and TensorRT may be risky in running paddle inference, I suggest you use our verified image first.

ZJU-lishuang commented 1 year ago

https://github.com/triton-inference-server/server/tree/r22.03 https://github.com/PaddlePaddle/Paddle/tree/a8ae87f118ddde049bd5c60c4493a667206f8055

ZJU-lishuang commented 1 year ago

我认为22.03的cuda和tensorrt版本应该没问题

heliqi commented 1 year ago

You can compile Paddle with release/2.4 : https://github.com/PaddlePaddle/Paddle/tree/release/2.4.

There may be a problem with the code branch you provide

ZJU-lishuang commented 1 year ago

I will try https://github.com/triton-inference-server/server/tree/r22.03 and https://github.com/PaddlePaddle/Paddle/tree/release/2.4 again.And report the problem. I have try this combination several days ago.

heliqi commented 1 year ago

I used 21.10 + paddle release/2.4 when compiling the paddlepaddle/triton_paddle:21.10 image. So I suspect the TRT version may not match

ZJU-lishuang commented 1 year ago

condition1

ERROR:

Scanning dependencies of target paddle_inference_c
Scanning dependencies of target paddle_inference_c_shared
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_config.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_utils.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
[100%] Linking CXX shared library libpaddle_inference.so
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_utils.cc.o
[100%] Linking CXX static library libpaddle_inference_c.a
[100%] Built target paddle_inference_c
[100%] Linking CXX shared library libpaddle_inference_c.so
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.42]':
io.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
io.cc:(.text+0x13ba): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.592]':
io.cc:(.text+0x146a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::inference::ReadBinaryFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)':
io.cc:(.text+0x178e): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x179f): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18aa): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<char [14]>(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18b7): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<char [14]>(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18e8): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/build.make:2244: paddle/fluid/inference/libpaddle_inference.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:163108: paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.67]':
pd_config.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
pd_config.cc:(.text+0x22a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.300]':
pd_config.cc:(.text+0x2da): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `PD_ConfigDestroy':
pd_config.cc:(.text+0x808e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/libpthread.so
pd_config.cc:(.text+0x80b7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_ptr<decltype(nullptr), (__gnu_cxx::_Lock_policy)2>::_M_dispose()' defined in .text._ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv[_ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
pd_config.cc:(.text+0x80e9): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy()' defined in .text._ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv[_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::what() const':
pd_config.cc:(.text._ZNK3phi7enforce13EnforceNotMet4whatEv[_ZNK3phi7enforce13EnforceNotMet4whatEv]+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `fLI::FLAGS_call_stack_level' defined in .data section in ../libpaddle_inference.a(flags.cc.o)
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::~EnforceNotMet()':
pd_config.cc:(.text._ZN3phi7enforce13EnforceNotMetD2Ev[_ZN3phi7enforce13EnforceNotMetD5Ev]+0x13): additional relocation overflows omitted from the output
libpaddle_inference_c.so: PC-relative offset overflow in PLT entry for `_ZN3phi5funcs21LaunchBroadcastKernelINS_5dtype7float16ES3_NS_3kps13DivideFunctorIS3_fEELi1ELi1ELi4EEEvRKNS_10GPUContextERKSt6vectorIPKNS_11DenseTensorESaISD_EEPSA_IPSB_SaISI_EET1_RKNS_5ArrayINS4_7details15BroadcastConfigEXT2_EEE'
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/build.make:1204: paddle/fluid/inference/capi_exp/libpaddle_inference_c.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:177249: paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
The command '/bin/sh -c python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env &&     cd build-env &&     cmake .. -DWITH_PYTHON=OFF              -DWITH_GPU=ON              -DWITH_TESTING=OFF              -DWITH_INFERENCE_API_TEST=OFF              -DCMAKE_BUILD_TYPE=Release              -DCUDA_ARCH_NAME=Auto              -DON_INFER=ON              -DWITH_MKL=ON              -DWITH_TENSORRT=ON              -DWITH_ONNXRUNTIME=ON &&     make -j8' returned a non-zero code: 2

paddlepaddle_backend/paddle-lib/Dockerfile

FROM nvcr.io/nvidia/tritonserver:22.03-py3

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-key del 7fa2af80 \
    && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \
    && dpkg -i cuda-keyring_1.0-1_all.deb

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
        cmake \
        patchelf \
        python3-dev \
        unzip \
        gcc-8 \
        g++-8 \
        libgl1 \
        libssl-dev

RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100
RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100

RUN git clone 'https://github.com/PaddlePaddle/Paddle.git'
WORKDIR /opt/tritonserver/Paddle
RUN git pull && git checkout release/2.4

RUN python3 -m pip install pyyaml && mkdir build-env && \
    cd build-env && \
    cmake .. -DWITH_PYTHON=OFF \
             -DWITH_GPU=ON \
             -DWITH_TESTING=OFF \
             -DWITH_INFERENCE_API_TEST=OFF \
             -DCMAKE_BUILD_TYPE=Release \
             -DCUDA_ARCH_NAME=Auto \
             -DON_INFER=ON \
             -DWITH_MKL=ON \
             -DWITH_TENSORRT=ON \
             -DWITH_ONNXRUNTIME=ON  && \
    make -j`nproc`
heliqi commented 1 year ago

PaddlePaddle compilation errors require raising issue on the paddlepaddle official website: https://github.com/PaddlePaddle/Paddle/issues

@ZJU-lishuang Since release/2.4 hasn't been officially released yet, would you try v2.4.0-rc0?

ZJU-lishuang commented 1 year ago

the same problem,I have tried 2.4.0-rc0.

ZJU-lishuang commented 1 year ago

condition2

another dockerfile record ERROR:

[ 86%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_helpers.cc.o
[  6%] Building CUDA object paddle/phi/kernels/funcs/eigen/CMakeFiles/eigen_function.dir/broadcast.cu.o
[ 86%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_map_field.cc.o
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9146): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9155): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14797): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14806): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1365): error: identifier "__builtin_ia32_fpclassss" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1372): error: identifier "__builtin_ia32_fpclasssd" is undefined

[ 87%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_message.cc.o
[ 87%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_message_field.cc.o
[ 88%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/javanano/javanano_primitive_field.cc.o
[ 88%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/js/js_generator.cc.o
...
...
...
[ 94%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/objectivec/objectivec_oneof.cc.o
[ 94%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.cc.o
6 errors detected in the compilation of "/opt/tritonserver/Paddle/paddle/phi/kernels/funcs/eigen/broadcast.cu".
make[2]: *** [paddle/phi/kernels/funcs/eigen/CMakeFiles/eigen_function.dir/build.make:206: paddle/phi/kernels/funcs/eigen/CMakeFiles/eigen_function.dir/broadcast.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 95%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/php/php_generator.cc.o
[ 95%] Building CXX object CMakeFiles/libprotoc.dir/opt/tritonserver/Paddle/build-env/third_party/protobuf/src/extern_protobuf/src/google/protobuf/compiler/plugin.cc.o

paddlepaddle_backend/paddle-lib/Dockerfile

FROM nvcr.io/nvidia/tritonserver:22.03-py3

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-key del 7fa2af80 \
    && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \
    && dpkg -i cuda-keyring_1.0-1_all.deb

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
        cmake \
        patchelf \
        python3-dev \
        unzip \
        gcc-8 \
        g++-8 \
        libgl1 \
        libssl-dev

RUN git clone 'https://github.com/PaddlePaddle/Paddle.git'
WORKDIR /opt/tritonserver/Paddle
RUN git pull && git checkout release/2.4

RUN python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env && \
    cd build-env && \
    cmake .. -DWITH_PYTHON=OFF \
             -DWITH_GPU=ON \
             -DWITH_TESTING=OFF \
             -DWITH_INFERENCE_API_TEST=OFF \
             -DCMAKE_BUILD_TYPE=Release \
             -DCUDA_ARCH_NAME=Auto \
             -DON_INFER=ON \
             -DWITH_MKL=ON \
             -DWITH_TENSORRT=ON \
             -DWITH_ONNXRUNTIME=ON \
             -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` && \
    make -j8
ZJU-lishuang commented 1 year ago

condition3

ERROR:

[100%] Built target paddle_inference
Scanning dependencies of target paddle_inference_c
Scanning dependencies of target paddle_inference_c_shared
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_config.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_utils.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_predictor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_tensor.cc.o
[100%] Building CXX object paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/pd_utils.cc.o
[100%] Linking CXX static library libpaddle_inference_c.a
[100%] Built target paddle_inference_c
[100%] Linking CXX shared library libpaddle_inference_c.so
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.42]':
io.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
io.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
io.cc:(.text+0x13ba): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.592]':
io.cc:(.text+0x146a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_shared.dir/io.cc.o: in function `paddle::inference::ReadBinaryFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)':
io.cc:(.text+0x178e): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRSoPKcSD_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x179f): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18aa): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `void paddle::string::tinyformat::detail::FormatArg::formatImpl<char [14]>(std::ostream&, char const*, char const*, int, void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv[_ZN6paddle6string10tinyformat6detail9FormatArg10formatImplIA14_cEEvRSoPKcS8_iPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18b7): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `int paddle::string::tinyformat::detail::FormatArg::toIntImpl<char [14]>(void const*)' defined in .text._ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv[_ZN6paddle6string10tinyformat6detail9FormatArg9toIntImplIA14_cEEiPKv] section in CMakeFiles/paddle_inference_shared.dir/io.cc.o
io.cc:(.text+0x18e8): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/build.make:2244: paddle/fluid/inference/libpaddle_inference.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:163108: paddle/fluid/inference/CMakeFiles/paddle_inference_shared.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::safe_realloc(void*, unsigned long) [clone .part.67]':
pd_config.cc:(.text+0x11): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x18): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo for std::bad_alloc@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
pd_config.cc:(.text+0x29): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::bad_alloc::~bad_alloc()@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/libstdc++.so
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_at_maximum_capacity(unsigned long)':
pd_config.cc:(.text+0x22a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `paddle::report_size_overflow(unsigned long, unsigned long) [clone .constprop.300]':
pd_config.cc:(.text+0x2da): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `PD_ConfigDestroy':
pd_config.cc:(.text+0x808e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/libpthread.so
pd_config.cc:(.text+0x80b7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_ptr<decltype(nullptr), (__gnu_cxx::_Lock_policy)2>::_M_dispose()' defined in .text._ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv[_ZNSt15_Sp_counted_ptrIDnLN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
pd_config.cc:(.text+0x80e9): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy()' defined in .text._ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv[_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv] section in CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::what() const':
pd_config.cc:(.text._ZNK3phi7enforce13EnforceNotMet4whatEv[_ZNK3phi7enforce13EnforceNotMet4whatEv]+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `fLI::FLAGS_call_stack_level' defined in .data section in ../libpaddle_inference.a(flags.cc.o)
CMakeFiles/paddle_inference_c_shared.dir/pd_config.cc.o: in function `phi::enforce::EnforceNotMet::~EnforceNotMet()':
pd_config.cc:(.text._ZN3phi7enforce13EnforceNotMetD2Ev[_ZN3phi7enforce13EnforceNotMetD5Ev]+0x13): additional relocation overflows omitted from the output
libpaddle_inference_c.so: PC-relative offset overflow in PLT entry for `_ZN3phi5funcs21LaunchBroadcastKernelINS_5dtype7float16ES3_NS_3kps13DivideFunctorIS3_fEELi1ELi1ELi4EEEvRKNS_10GPUContextERKSt6vectorIPKNS_11DenseTensorESaISD_EEPSA_IPSB_SaISI_EET1_RKNS_5ArrayINS4_7details15BroadcastConfigEXT2_EEE'
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/build.make:1204: paddle/fluid/inference/capi_exp/libpaddle_inference_c.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:177249: paddle/fluid/inference/capi_exp/CMakeFiles/paddle_inference_c_shared.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
The command '/bin/sh -c python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env &&     cd build-env &&     cmake .. -DWITH_PYTHON=OFF              -DWITH_GPU=ON              -DWITH_TESTING=OFF              -DWITH_INFERENCE_API_TEST=OFF              -DCMAKE_BUILD_TYPE=Release              -DCUDA_ARCH_NAME=Auto              -DON_INFER=ON              -DWITH_MKL=ON              -DWITH_TENSORRT=ON              -DWITH_ONNXRUNTIME=ON              -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` &&     make -j8' returned a non-zero code: 2

paddlepaddle_backend/paddle-lib/Dockerfile

FROM nvcr.io/nvidia/tritonserver:22.03-py3

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-key del 7fa2af80 \
    && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \
    && dpkg -i cuda-keyring_1.0-1_all.deb

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
        cmake \
        patchelf \
        python3-dev \
        unzip \
        gcc-8 \
        g++-8 \
        libgl1 \
        libssl-dev

RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100
RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100

RUN git clone 'https://github.com/PaddlePaddle/Paddle.git'
WORKDIR /opt/tritonserver/Paddle
RUN git pull && git checkout release/2.4

RUN python3 -m pip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simple && mkdir build-env && \
    cd build-env && \
    cmake .. -DWITH_PYTHON=OFF \
             -DWITH_GPU=ON \
             -DWITH_TESTING=OFF \
             -DWITH_INFERENCE_API_TEST=OFF \
             -DCMAKE_BUILD_TYPE=Release \
             -DCUDA_ARCH_NAME=Auto \
             -DON_INFER=ON \
             -DWITH_MKL=ON \
             -DWITH_TENSORRT=ON \
             -DWITH_ONNXRUNTIME=ON \
             -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` && \
    make -j8
heliqi commented 1 year ago

Please put forward issues on https://github.com/PaddlePaddle/Paddle/issues about PaddlePaddle compilation.