triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.37k stars 1.49k forks source link

failed to allocate pinned system memory: no pinned memory pool, falling back to non-pinned system memory #7809

Closed IceHowe closed 3 days ago

IceHowe commented 4 days ago

Description A clear and concise description of what the bug is.

Fastdeploy部署PPOCRv3,输出结果为空,服务端报错,按照github的issue中提示,已经将cls\det\rec_runtime内的conf里的设置进行更改:

# Number of instances of the model
instance_group [
  {
    # The number of instances is 1
    count: 1
    # Use GPU, CPU inference option is:KIND_CPU
    kind: KIND_CPU
    # The instance is deployed on the 0th GPU card
    #gpus: [0]
  }
]

optimization {
  execution_accelerators {
    # GPU推理配置, 配合KIND_GPU使用
    gpu_execution_accelerator : [
      {
        #name : "paddle"
        name : "openvino"
        #name : "tensorrt"
        # 设置推理并行计算线程数为4
        parameters { key: "cpu_threads" value: "1" }
        # 开启mkldnn加速,设置为0关闭mkldnn
        parameters { key: "use_mkldnn" value: "0" }
      }
    ]
  }
}

服务端日志:

root@a5f86ab2e31a:/ocr_serving/models# fastdeployserver --model-repository=/ocr_serving/models I1118 11:09:50.623702 1258 model_repository_manager.cc:1022] loading: det_postprocess:1 I1118 11:09:50.725190 1258 model_repository_manager.cc:1022] loading: det_preprocess:1 I1118 11:09:50.732871 1258 python.cc:1875] TRITONBACKEND_ModelInstanceInitialize: det_postprocess_0 (CPU device 0) I1118 11:09:50.825618 1258 model_repository_manager.cc:1022] loading: det_runtime:1 I1118 11:09:50.925836 1258 model_repository_manager.cc:1022] loading: rec_postprocess:1 I1118 11:09:51.026065 1258 model_repository_manager.cc:1022] loading: cls_runtime:1 I1118 11:09:51.126340 1258 model_repository_manager.cc:1022] loading: cls_postprocess:1 I1118 11:09:51.226650 1258 model_repository_manager.cc:1022] loading: rec_runtime:1 /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " model_config: {'name': 'det_postprocess', 'platform': '', 'backend': 'python', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 128, 'input': [{'name': 'POST_INPUT_0', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [1, -1, -1], 'is_shape_tensor': False, 'allow_ragged_batch': False}, {'name': 'POST_INPUT_1', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [4], 'is_shape_tensor': False, 'allow_ragged_batch': False}, {'name': 'ORI_IMG', 'data_type': 'TYPE_UINT8', 'format': 'FORMAT_NONE', 'dims': [-1, -1, 3], 'is_shape_tensor': False, 'allow_ragged_batch': False}], 'output': [{'name': 'POST_OUTPUT_0', 'data_type': 'TYPE_STRING', 'dims': [-1, 1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'POST_OUTPUT_1', 'data_type': 'TYPE_FP32', 'dims': [-1, 1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'POST_OUTPUT_2', 'data_type': 'TYPE_FP32', 'dims': [-1, -1, 1], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'instance_group': [{'name': 'det_postprocess_0', 'kind': 'KIND_CPU', 'count': 1, 'gpus': [], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': '', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {}, 'model_warmup': []} postprocess input names: ['POST_INPUT_0', 'POST_INPUT_1', 'ORI_IMG'] postprocess output names: ['POST_OUTPUT_0', 'POST_OUTPUT_1', 'POST_OUTPUT_2'] I1118 11:09:52.090375 1258 model_repository_manager.cc:1183] successfully loaded 'det_postprocess' version 1 I1118 11:09:52.173708 1258 fastdeploy_runtime.cc:1207] TRITONBACKEND_Initialize: fastdeploy I1118 11:09:52.173742 1258 fastdeploy_runtime.cc:1216] Triton TRITONBACKEND API version: 1.6 I1118 11:09:52.173765 1258 fastdeploy_runtime.cc:1221] 'fastdeploy' TRITONBACKEND API version: 1.6 I1118 11:09:52.173786 1258 fastdeploy_runtime.cc:1250] backend configuration: {} I1118 11:09:52.173822 1258 python.cc:1875] TRITONBACKEND_ModelInstanceInitialize: det_preprocess_0 (CPU device 0) /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " model_config: {'name': 'det_preprocess', 'platform': '', 'backend': 'python', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 1, 'input': [{'name': 'INPUT_0', 'data_type': 'TYPE_UINT8', 'format': 'FORMAT_NONE', 'dims': [-1, -1, 3], 'is_shape_tensor': False, 'allow_ragged_batch': False}], 'output': [{'name': 'OUTPUT_0', 'data_type': 'TYPE_FP32', 'dims': [3, -1, -1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'OUTPUT_1', 'data_type': 'TYPE_INT32', 'dims': [4], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'instance_group': [{'name': 'det_preprocess_0', 'kind': 'KIND_CPU', 'count': 1, 'gpus': [], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': '', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {}, 'model_warmup': []} preprocess input names: ['INPUT_0'] preprocess output names: ['OUTPUT_0', 'OUTPUT_1'] I1118 11:09:53.005776 1258 fastdeploy_runtime.cc:1280] TRITONBACKEND_ModelInitialize: det_runtime (version 1) I1118 11:09:53.006248 1258 model_repository_manager.cc:1183] successfully loaded 'det_preprocess' version 1 I1118 11:09:53.006549 1258 python.cc:1875] TRITONBACKEND_ModelInstanceInitialize: rec_postprocess_0 (CPU device 0) E0000 00:00:00.000000 1407 dir_reader_linux.h:41] RAW: Failed to close directory handle /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " model_config: {'name': 'rec_postprocess', 'platform': '', 'backend': 'python', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 128, 'input': [{'name': 'POST_INPUT_0', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [-1, 6625], 'is_shape_tensor': False, 'allow_ragged_batch': False}], 'output': [{'name': 'POST_OUTPUT_0', 'data_type': 'TYPE_STRING', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'POST_OUTPUT_1', 'data_type': 'TYPE_FP32', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'instance_group': [{'name': 'rec_postprocess_0', 'kind': 'KIND_CPU', 'count': 1, 'gpus': [], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': '', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {}, 'model_warmup': []} postprocess input names: ['POST_INPUT_0'] postprocess output names: ['POST_OUTPUT_0', 'POST_OUTPUT_1'] I1118 11:09:53.827329 1258 model_repository_manager.cc:1183] successfully loaded 'rec_postprocess' version 1 I1118 11:09:53.827459 1258 fastdeploy_runtime.cc:1280] TRITONBACKEND_ModelInitialize: rec_runtime (version 1) I1118 11:09:53.827721 1258 fastdeploy_runtime.cc:1319] TRITONBACKEND_ModelInstanceInitialize: rec_runtime_0 (CPU device 0) [INFO] fastdeploy/runtime/runtime.cc(91)::AutoSelectBackend FastDeploy will choose Backend::PDINFER to inference this model. WARNING: Logging before InitGoogleLogging() is written to STDERR W1118 19:09:53.828105 1309 analysis_config.cc:965] It is detected that mkldnn and memory_optimize_pass are enabled at the same time, but they are not supported yet. Currently, memory_optimize_pass is explicitly disabled [INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackend Runtime initialized with Backend::PDINFER in Device::CPU. I1118 11:09:54.001134 1258 fastdeploy_runtime.cc:1280] TRITONBACKEND_ModelInitialize: cls_runtime (version 1) I1118 11:09:54.001491 1258 fastdeploy_runtime.cc:1319] TRITONBACKEND_ModelInstanceInitialize: cls_runtime_0 (CPU device 0) I1118 11:09:54.001508 1258 model_repository_manager.cc:1183] successfully loaded 'rec_runtime' version 1 [INFO] fastdeploy/runtime/runtime.cc(91)::AutoSelectBackend FastDeploy will choose Backend::PDINFER to inference this model. [INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackend Runtime initialized with Backend::PDINFER in Device::CPU. I1118 11:09:54.155767 1258 fastdeploy_runtime.cc:1319] TRITONBACKEND_ModelInstanceInitialize: det_runtime_0 (CPU device 0) [INFO] fastdeploy/runtime/runtime.cc(91)::AutoSelectBackend FastDeploy will choose Backend::PDINFER to inference this model. I1118 11:09:54.155961 1258 model_repository_manager.cc:1183] successfully loaded 'cls_runtime' version 1 [INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackend Runtime initialized with Backend::PDINFER in Device::CPU. I1118 11:09:54.374951 1258 python.cc:1875] TRITONBACKEND_ModelInstanceInitialize: cls_postprocess_0 (CPU device 0) I1118 11:09:54.375091 1258 model_repository_manager.cc:1183] successfully loaded 'det_runtime' version 1 /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " model_config: {'name': 'cls_postprocess', 'platform': '', 'backend': 'python', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 128, 'input': [{'name': 'POST_INPUT_0', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [2], 'is_shape_tensor': False, 'allow_ragged_batch': False}], 'output': [{'name': 'POST_OUTPUT_0', 'data_type': 'TYPE_INT32', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'POST_OUTPUT_1', 'data_type': 'TYPE_FP32', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'instance_group': [{'name': 'cls_postprocess_0', 'kind': 'KIND_CPU', 'count': 1, 'gpus': [], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': '', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {}, 'model_warmup': []} postprocess input names: ['POST_INPUT_0'] postprocess output names: ['POST_OUTPUT_0', 'POST_OUTPUT_1'] I1118 11:09:55.166482 1258 model_repository_manager.cc:1183] successfully loaded 'cls_postprocess' version 1 I1118 11:09:55.166803 1258 model_repository_manager.cc:1022] loading: pp_ocr:1 I1118 11:09:55.267046 1258 model_repository_manager.cc:1022] loading: rec_pp:1 I1118 11:09:55.367329 1258 model_repository_manager.cc:1022] loading: cls_pp:1 I1118 11:09:55.467715 1258 model_repository_manager.cc:1183] successfully loaded 'rec_pp' version 1 I1118 11:09:55.467723 1258 model_repository_manager.cc:1183] successfully loaded 'pp_ocr' version 1 I1118 11:09:55.467859 1258 model_repository_manager.cc:1183] successfully loaded 'cls_pp' version 1 I1118 11:09:55.468084 1258 server.cc:522] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+

I1118 11:09:55.468227 1258 server.cc:549] +------------+---------------------------------------------------------------+--------+ | Backend | Path | Config | +------------+---------------------------------------------------------------+--------+ | python | /opt/tritonserver/backends/python/libtriton_python.so | {} | | fastdeploy | /opt/tritonserver/backends/fastdeploy/libtriton_fastdeploy.so | {} | +------------+---------------------------------------------------------------+--------+

I1118 11:09:55.468415 1258 server.cc:592] +-----------------+---------+--------+ | Model | Version | Status | +-----------------+---------+--------+ | cls_postprocess | 1 | READY | | cls_pp | 1 | READY | | cls_runtime | 1 | READY | | det_postprocess | 1 | READY | | det_preprocess | 1 | READY | | det_runtime | 1 | READY | | pp_ocr | 1 | READY | | rec_postprocess | 1 | READY | | rec_pp | 1 | READY | | rec_runtime | 1 | READY | +-----------------+---------+--------+

I1118 11:09:55.468609 1258 tritonserver.cc:1920] +----------------------------------+----------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.15.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) sch | | | edule_policy model_configuration system_shared_memory cuda_sharedmemory binary | | | tensor_data statistics | | model_repository_path[0] | /ocr_serving/models | | model_control_mode | MODE_NONE | | strict_model_config | 1 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | response_cache_byte_size | 0 | | min_supported_compute_capability | 0.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+----------------------------------------------------------------------------------+

I1118 11:09:55.503013 1258 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001 I1118 11:09:55.503214 1258 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000 I1118 11:09:55.545727 1258 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002 W1118 11:09:59.346387 1258 pinned_memory_manager.cc:133] failed to allocate pinned system memory: no pinned memory pool, falling back to non-pinned system memory

客户端结果:

python test.py 状态码: 200 响应内容: {"model_name":"pp_ocr","model_version":"1","parameters":{"sequence_id":0,"sequence_start":false,"sequence_end":false,"sequence_id":0,"sequence_start":false,"sequence_end":false},"outputs":[{"name":"det_bboxes","datatype":"FP64","shape":[1,0],"data":[]},{"name":"rec_scores","datatype":"FP64","shape":[1,1],"data":[0.0]},{"name":"rec_texts","datatype":"BYTES","shape":[1,1],"data":[""]}]} 1 1 0 text= score= 0.0 bbox= [] Triton Information What version of Triton are you using?

环境: 直接从官方下载的docker容器: registry.baidubce.com/paddlepaddle/fastdeploy:1.0.5-cpu-only-21.10,1.0.4的cpu/gpu也都尝试过 root@a5f86ab2e31a:/ocr_serving/models# pip list Package Version


aiohttp 3.8.4 aiosignal 1.3.1 anyio 3.6.2 asgiref 3.6.0 astor 0.8.1 async-timeout 4.0.2 attrs 22.2.0 Babel 2.12.1 bce-python-sdk 0.8.79 certifi 2019.11.28 chardet 3.0.4 charset-normalizer 3.1.0 click 8.1.3 colorama 0.4.6 colorlog 6.7.0 contourpy 1.0.7 cycler 0.11.0 datasets 2.10.1 dbus-python 1.2.16 decorator 5.1.1 dill 0.3.4 fast-tokenizer-python 1.0.2 fastapi 0.95.0 fastdeploy-python 0.0.0 fastdeploy-tools 0.0.5 filelock 3.10.3 Flask 2.2.3 Flask-Babel 2.0.0 fonttools 4.39.2 frozenlist 1.3.3 fsspec 2023.3.0 future 0.18.3 h11 0.14.0 huggingface-hub 0.13.3 idna 2.8 importlib-metadata 6.1.0 importlib-resources 5.12.0 itsdangerous 2.1.2 jieba 0.42.1 Jinja2 3.1.2 joblib 1.2.0 kiwisolver 1.4.4 markdown-it-py 2.2.0 MarkupSafe 2.1.2 matplotlib 3.7.1 mdurl 0.1.2 multidict 6.0.4 multiprocess 0.70.12.2 numpy 1.23.3 opencv-python 4.7.0.72 opt-einsum 3.3.0 packaging 23.0 paddle-bfloat 0.1.7 paddle2onnx 1.0.6 paddlefsl 1.1.0 paddlenlp 2.5.2 paddlepaddle 2.4.2 pandas 1.5.3 Pillow 9.4.0 pip 23.0.1 protobuf 3.20.0 pyarrow 11.0.0 pycryptodome 3.17 pydantic 1.10.7 Pygments 2.14.0 PyGObject 3.36.0 pyparsing 3.0.9 python-apt 2.0.0+ubuntu0.20.4.8 python-dateutil 2.8.2 pytz 2022.7.1 PyYAML 6.0 requests 2.22.0 requests-unixsocket 0.2.0 responses 0.18.0 rich 13.3.2 scikit-learn 1.2.2 scipy 1.10.1 sentencepiece 0.1.97 seqeval 1.2.2 setuptools 65.4.1 six 1.14.0 sniffio 1.3.0 starlette 0.26.1 threadpoolctl 3.1.0 tqdm 4.65.0 typer 0.7.0 typing_extensions 4.5.0 urllib3 1.26.15 uvicorn 0.16.0 visualdl 2.4.2 Werkzeug 2.2.3 wheel 0.37.1 xxhash 3.2.0 yarl 1.8.2 zipp 3.15.0

请问这个该怎么解决

IceHowe commented 3 days ago

问题已解决,


# Number of instances of the model
instance_group [
  {
    # The number of instances is 1
    count: 1
    # Use GPU, CPU inference option is:KIND_CPU
    kind: KIND_CPU
    # The instance is deployed on the 0th GPU card
    #gpus: [0]
  }
]

optimization {
  execution_accelerators {
    # GPU推理配置, 配合KIND_GPU使用
    gpu_execution_accelerator : [
      {
        #name : "paddle"
        name : "openvino"
        #name : "tensorrt"
        # 设置推理并行计算线程数为4
        parameters { key: "cpu_threads" value: "1" }
        # 开启mkldnn加速,设置为0关闭mkldnn
        parameters { key: "use_mkldnn" value: "0" }
      }
    ]
  }
}
```里面的gpu_execution_accelerator修改为cpu_execution_accelerator