triton-inference-server / paddlepaddle_backend

BSD 3-Clause "New" or "Revised" License
32 stars 6 forks source link

perf_analyzer paddlepaddle model fault #1

Closed zhaohb closed 3 years ago

zhaohb commented 3 years ago

I compiled the backend of paddle using version 21.04 and successfully generated the libtriton_paddle.so required by triton and successfully loaded the model of paddle, but I made an error when I pressed the model using perf_Analyzer:

# ./perf_analyzer -a -b 30 -u localhost:8001 -i gRPC -m rec --concurrency-range 1 --shape x:3,32,32
*** Measurement Settings ***
  Batch size: 30
  Measurement window: 5000 msec
  Using asynchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
Failed to retrieve results from inference request.
Thread [0] had error: Failed to create output tensor 'save_infer_model/scale_0.tmp_1 for 'rec_0'

triton outout:

I0629 06:45:58.323692 307 paddle.cc:1207] model cls, instance cls_0, executing 1 requests
I0629 06:45:58.323710 307 paddle.cc:846] TRITONBACKEND_ModelExecute: Running cls_0 with 1 requests
I0629 06:45:58.337416 307 paddle.cc:1026] TRITONBACKEND_ModelExecute: model cls_0 released 1 requests
I0629 06:46:04.318618 307 http_server.cc:1229] HTTP request: 0 /v2/models/cls/stats
I0629 06:46:04.318654 307 model_repository_manager.cc:615] VersionStates() 'cls'
I0629 06:46:04.318665 307 model_repository_manager.cc:659] GetInferenceBackend() 'cls' version 1
I0629 06:47:58.797566 307 grpc_server.cc:270] Process for ModelMetadata, rpc_ok=1, 1 step START
I0629 06:47:58.797597 307 grpc_server.cc:225] Ready for RPC 'ModelMetadata', 2
I0629 06:47:58.797607 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version -1
I0629 06:47:58.797617 307 model_repository_manager.cc:615] VersionStates() 'rec'
I0629 06:47:58.797691 307 grpc_server.cc:270] Process for ModelMetadata, rpc_ok=1, 1 step COMPLETE
I0629 06:47:58.797699 307 grpc_server.cc:408] Done for ModelMetadata, 1
I0629 06:47:58.798961 307 grpc_server.cc:270] Process for ModelConfig, rpc_ok=1, 1 step START
I0629 06:47:58.798977 307 grpc_server.cc:225] Ready for RPC 'ModelConfig', 2
I0629 06:47:58.798984 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version -1
I0629 06:47:58.799748 307 grpc_server.cc:270] Process for ModelConfig, rpc_ok=1, 1 step COMPLETE
I0629 06:47:58.799761 307 grpc_server.cc:408] Done for ModelConfig, 1
I0629 06:47:58.801194 307 grpc_server.cc:270] Process for ServerMetadata, rpc_ok=1, 1 step START
I0629 06:47:58.801237 307 grpc_server.cc:225] Ready for RPC 'ServerMetadata', 2
I0629 06:47:58.801284 307 grpc_server.cc:270] Process for ServerMetadata, rpc_ok=1, 1 step COMPLETE
I0629 06:47:58.801290 307 grpc_server.cc:408] Done for ServerMetadata, 1
I0629 06:47:58.801546 307 grpc_server.cc:270] Process for ModelStatistics, rpc_ok=1, 2 step START
I0629 06:47:58.801568 307 grpc_server.cc:225] Ready for RPC 'ModelStatistics', 3
I0629 06:47:58.801579 307 model_repository_manager.cc:615] VersionStates() 'rec'
I0629 06:47:58.801590 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version 1
I0629 06:47:58.801673 307 grpc_server.cc:270] Process for ModelStatistics, rpc_ok=1, 2 step COMPLETE
I0629 06:47:58.801682 307 grpc_server.cc:408] Done for ModelStatistics, 2
I0629 06:47:58.802736 307 grpc_server.cc:3124] Process for ModelInferHandler, rpc_ok=1, 4 step START
I0629 06:47:58.802760 307 grpc_server.cc:3117] New request handler for ModelInferHandler, 5
I0629 06:47:58.802768 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version -1
I0629 06:47:58.802776 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version -1
I0629 06:47:58.802795 307 infer_request.cc:497] prepared: [0x0x7fd914009ec0] request id: 0, model: rec, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 30, priority: 0, timeout (us): 0
original inputs:
[0x0x7fd91400a148] input: x, type: FP32, original shape: [30,3,32,32], batch + shape: [30,3,32,32], shape: [3,32,32]
override inputs:
inputs:
[0x0x7fd91400a148] input: x, type: FP32, original shape: [30,3,32,32], batch + shape: [30,3,32,32], shape: [3,32,32]
original requested outputs:
save_infer_model/scale_0.tmp_1
requested outputs:
save_infer_model/scale_0.tmp_1

I0629 06:47:58.802985 307 paddle.cc:1207] model rec, instance rec_0, executing 1 requests
I0629 06:47:58.803000 307 paddle.cc:846] TRITONBACKEND_ModelExecute: Running rec_0 with 1 requests
W0629 06:47:58.826670   317 rnn_op.cu.cc:331] If the memory space of the Input WeightList is not continuous, less efficient calculation will be called. Please call flatten_parameters() to make the input memory continuous.
I0629 06:47:59.041294 307 grpc_server.cc:3275] ModelInferHandler::InferResponseComplete, 4 step ISSUED
I0629 06:47:59.041468 307 grpc_server.cc:2847] ModelInferHandler::InferRequestComplete
I0629 06:47:59.041487 307 paddle.cc:1026] TRITONBACKEND_ModelExecute: model rec_0 released 1 requests
I0629 06:47:59.041547 307 grpc_server.cc:3124] Process for ModelInferHandler, rpc_ok=1, 4 step COMPLETE
I0629 06:47:59.041559 307 grpc_server.cc:2169] Done for ModelInferHandler, 4
I0629 06:48:04.801946 307 grpc_server.cc:270] Process for ModelStatistics, rpc_ok=1, 3 step START
I0629 06:48:04.801980 307 grpc_server.cc:225] Ready for RPC 'ModelStatistics', 4
I0629 06:48:04.801991 307 model_repository_manager.cc:615] VersionStates() 'rec'
I0629 06:48:04.802002 307 model_repository_manager.cc:659] GetInferenceBackend() 'rec' version 1
I0629 06:48:04.802103 307 grpc_server.cc:270] Process for ModelStatistics, rpc_ok=1, 3 step COMPLETE
I0629 06:48:04.802111 307 grpc_server.cc:408] Done for ModelStatistics, 3

How to solve this problem?

zhaohb commented 3 years ago

I test with tritonclient and get the same error

zlsh80826 commented 3 years ago

Hello @zhaohb,

Is rec your own model? Can you share the config.pbtxt for more information?

zhaohb commented 3 years ago

config.pbtxt :

name: "rec"
backend: "paddle"
max_batch_size: 4096
input {
  name: "x"
  data_type: TYPE_FP32
  dims: 3
  dims: 32
  dims: -1
}
output {
  name: "save_infer_model/scale_0.tmp_1"
  data_type: TYPE_FP32
  dims: -1
  dims: 6625
}
instance_group {
  count: 1
  kind: KIND_GPU
}
dynamic_batching {
  preferred_batch_size: 1024
  preferred_batch_size: 2048
  max_queue_delay_microseconds: 100
}

rec is also not my model, I get from https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/inference.md

zhaohb commented 3 years ago

@zlsh80826 did you repeat the question?how to solve?

zlsh80826 commented 3 years ago

@zhaohb , thank you for reaching out. I will look into this model these days.

zlsh80826 commented 3 years ago

@zhaohb, which model you were using? There are many models in your provided link. The problem you met was that your provided output name can't be found in predictor_->GetOutputNames. For example, the output name of ch_ppocr_mobile_v2.0_rec_infer should be ctc_fc.tmp_1 instead of save_infer_model/scale_0.tmp_1.

Following is a workable config.pbtxt for ch_ppocr_mobile_v2.0_rec_infer

name: "rec"
backend: "paddle"
max_batch_size: 4096
input {
  name: "x"
  data_type: TYPE_FP32
  dims: 3
  dims: 32
  dims: 100
}
output {
  name: "ctc_fc.tmp_1"
  data_type: TYPE_FP32
  dims: 25
  dims: 6625
}
instance_group {
  count: 1
  kind: KIND_GPU
}
dynamic_batching {
  preferred_batch_size: 1024
  preferred_batch_size: 2048
  max_queue_delay_microseconds: 100
}
zhaohb commented 3 years ago

@zlsh80826 I just use rec model https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/inference.md#%E8%AF%86%E5%88%AB%E6%A8%A1%E5%9E%8B%E8%BD%ACinference%E6%A8%A1%E5%9E%8B but when I open this model with netron, I can found the output is save_infer_model/scale_0.tmp_1: image

So, where does this method come from, and how do I call it to get the name of the output?

zlsh80826 commented 3 years ago

You can add the following lines

output_names = self.predictor.get_output_names()
print(output_names)

at https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/tools/infer/predict_rec.py#L237 Then follow the PaddleOCR instruction to run prediction and print the output names. e.g. python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir inference/rec_crnn

Why the output tensor is not the last layer showed in Netron is out of my knowledge, you might ask on PaddlePaddle. But I guess maybe the op fusion strategy affects it.

zhaohb commented 3 years ago

ok, thank you very much.

zlsh80826 commented 3 years ago

Solved