triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.12k stars 1.46k forks source link

Segmentation fault when loading new version of model #7016

Open yutkin opened 6 months ago

yutkin commented 6 months ago

Description We have a PyTorch model that we serve on the CPU using the triton inference server. We use the POLL approach to monitor the new model versions in the GCS bucket. We also use the gRPC protocol and proxyless load balancing with the Traffic Director to balance the load across triton pods.

When triton loads a new model from the GCS bucket, it sometimes fails with a Segmentation fault error. Example log:

INFO 2024-03-20T12:09:07.685051946Z =============================
INFO 2024-03-20T12:09:07.685054971Z == Triton Inference Server ==
INFO 2024-03-20T12:09:07.685056553Z =============================
INFO 2024-03-20T12:09:07.687105838Z {}
INFO 2024-03-20T12:09:07.687115734Z NVIDIA Release 24.02 (build 83572450)
INFO 2024-03-20T12:09:07.687120894Z Triton Server Version 2.43.0
INFO 2024-03-20T12:09:07.687776991Z {}
INFO 2024-03-20T12:09:07.687781021Z Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
INFO 2024-03-20T12:09:07.688467186Z {}
INFO 2024-03-20T12:09:07.688470647Z Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
INFO 2024-03-20T12:09:07.688472771Z {}
INFO 2024-03-20T12:09:07.688474856Z This container image and its contents are governed by the NVIDIA Deep Learning Container License.
INFO 2024-03-20T12:09:07.688476734Z By pulling and using the container, you accept the terms and conditions of this license:
INFO 2024-03-20T12:09:07.688479095Z https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
INFO 2024-03-20T12:09:07.703584786Z {}
INFO 2024-03-20T12:09:07.703601603Z WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
INFO 2024-03-20T12:09:07.703603964Z Use the NVIDIA Container Toolkit to start this container with GPU support; see
INFO 2024-03-20T12:09:07.703606473Z https://docs.nvidia.com/datacenter/cloud-native/ .
INFO 2024-03-20T12:09:07.706870592Z {}
ERROR 2024-03-20T12:09:07.721496906Z 2024-03-20T12:09:07Z W 58 pinned_memory_manager.cc:271] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
ERROR 2024-03-20T12:09:07.721517584Z 2024-03-20T12:09:07Z I 58 cuda_memory_manager.cc:117] CUDA memory pool disabled
ERROR 2024-03-20T12:09:07.721537509Z 2024-03-20T12:09:07Z E 58 server.cc:243] CudaDriverHelper has not been initialized.
WARNING 2024-03-20T12:09:21Z Readiness probe failed: Get "http://100.72.8.164:8501/v2/health/ready": dial tcp 100.72.8.164:8501: connect: connection refused
ERROR 2024-03-20T12:09:23.367015994Z 2024-03-20T12:09:23Z I 58 model_lifecycle.cc:469] loading: combined_model_new:20240319193853
WARNING 2024-03-20T12:09:26Z Readiness probe failed: Get "http://100.72.8.164:8501/v2/health/ready": dial tcp 100.72.8.164:8501: connect: connection refused
WARNING 2024-03-20T12:09:31Z Readiness probe failed: Get "http://100.72.8.164:8501/v2/health/ready": dial tcp 100.72.8.164:8501: connect: connection refused
ERROR 2024-03-20T12:09:35.790086031Z WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
ERROR 2024-03-20T12:09:35.791589054Z 2024-03-20T12:09:35Z I 58 libtorch.cc:2467] TRITONBACKEND_Initialize: pytorch
ERROR 2024-03-20T12:09:35.791613740Z 2024-03-20T12:09:35Z I 58 libtorch.cc:2477] Triton TRITONBACKEND API version: 1.18
ERROR 2024-03-20T12:09:35.791616690Z 2024-03-20T12:09:35Z I 58 libtorch.cc:2483] 'pytorch' TRITONBACKEND API version: 1.18
ERROR 2024-03-20T12:09:35.791621994Z 2024-03-20T12:09:35Z I 58 libtorch.cc:2516] TRITONBACKEND_ModelInitialize: combined_model_new (version 20240319193853)
ERROR 2024-03-20T12:09:35.792209270Z 2024-03-20T12:09:35Z I 58 libtorch.cc:347] Optimized execution is enabled for model instance 'combined_model_new'
ERROR 2024-03-20T12:09:35.792232891Z 2024-03-20T12:09:35Z I 58 libtorch.cc:366] Cache Cleaning is disabled for model instance 'combined_model_new'
ERROR 2024-03-20T12:09:35.792236280Z 2024-03-20T12:09:35Z I 58 libtorch.cc:383] Inference Mode is enabled for model instance 'combined_model_new'
ERROR 2024-03-20T12:09:35.944197086Z 2024-03-20T12:09:35Z I 58 libtorch.cc:2560] TRITONBACKEND_ModelInstanceInitialize: combined_model_new_0_0 (CPU device 0)
ERROR 2024-03-20T12:09:35.989993579Z 2024-03-20T12:09:35Z W 58 pinned_memory_manager.cc:170] failed to allocate pinned system memory: no pinned memory pool, falling back to non-pinned system memory
WARNING 2024-03-20T12:09:36Z Readiness probe failed: Get "http://100.72.8.164:8501/v2/health/ready": dial tcp 100.72.8.164:8501: connect: connection refused
ERROR 2024-03-20T12:09:36.148728368Z 2024-03-20T12:09:36Z I 58 libtorch.cc:2560] TRITONBACKEND_ModelInstanceInitialize: combined_model_new_0_1 (CPU device 0)
ERROR 2024-03-20T12:09:36.296388378Z 2024-03-20T12:09:36Z I 58 libtorch.cc:2560] TRITONBACKEND_ModelInstanceInitialize: combined_model_new_0_2 (CPU device 0)
ERROR 2024-03-20T12:09:36.443443199Z 2024-03-20T12:09:36Z I 58 libtorch.cc:2560] TRITONBACKEND_ModelInstanceInitialize: combined_model_new_0_3 (CPU device 0)
ERROR 2024-03-20T12:09:36.590531792Z 2024-03-20T12:09:36Z I 58 libtorch.cc:2560] TRITONBACKEND_ModelInstanceInitialize: combined_model_new_0_4 (CPU device 0)
ERROR 2024-03-20T12:09:36.741618809Z 2024-03-20T12:09:36Z I 58 model_lifecycle.cc:835] successfully loaded 'combined_model_new'
ERROR 2024-03-20T12:09:36.741651148Z 2024-03-20T12:09:36Z I 58 server.cc:607]
ERROR 2024-03-20T12:09:36.741654662Z +------------------+------+
ERROR 2024-03-20T12:09:36.741656489Z | Repository Agent | Path |
ERROR 2024-03-20T12:09:36.741658464Z +------------------+------+
ERROR 2024-03-20T12:09:36.741660604Z +------------------+------+
ERROR 2024-03-20T12:09:36.741662271Z {}
ERROR 2024-03-20T12:09:36.741697566Z 2024-03-20T12:09:36Z I 58 server.cc:634]
ERROR 2024-03-20T12:09:36.741702564Z +---------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
ERROR 2024-03-20T12:09:36.741707862Z | Backend | Path | Config |
ERROR 2024-03-20T12:09:36.741712579Z +---------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
ERROR 2024-03-20T12:09:36.741714668Z | pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
ERROR 2024-03-20T12:09:36.741716702Z +---------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
ERROR 2024-03-20T12:09:36.741718541Z {}
ERROR 2024-03-20T12:09:36.741721156Z 2024-03-20T12:09:36Z I 58 server.cc:677]
ERROR 2024-03-20T12:09:36.741723138Z +--------------------+----------------+--------+
ERROR 2024-03-20T12:09:36.741727484Z | Model | Version | Status |
ERROR 2024-03-20T12:09:36.741729629Z +--------------------+----------------+--------+
ERROR 2024-03-20T12:09:36.741731534Z | combined_model_new | 20240319193853 | READY |
ERROR 2024-03-20T12:09:36.741733384Z +--------------------+----------------+--------+
ERROR 2024-03-20T12:09:36.741734924Z {}
ERROR 2024-03-20T12:09:36.741812442Z 2024-03-20T12:09:36Z I 58 metrics.cc:770] Collecting CPU metrics
ERROR 2024-03-20T12:09:36.741927353Z 2024-03-20T12:09:36Z I 58 tritonserver.cc:2508]
ERROR 2024-03-20T12:09:36.741931198Z +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
ERROR 2024-03-20T12:09:36.741933341Z | Option | Value |
ERROR 2024-03-20T12:09:36.741935076Z +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
ERROR 2024-03-20T12:09:36.741937124Z | server_id | triton |
ERROR 2024-03-20T12:09:36.741950164Z | server_version | 2.43.0 |
ERROR 2024-03-20T12:09:36.741952467Z | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
ERROR 2024-03-20T12:09:36.741954279Z | model_repository_path[0] | gs://model-bucket/ |
ERROR 2024-03-20T12:09:36.741956220Z | model_control_mode | MODE_POLL |
ERROR 2024-03-20T12:09:36.741957989Z | strict_model_config | 1 |
ERROR 2024-03-20T12:09:36.741959716Z | rate_limit | OFF |
ERROR 2024-03-20T12:09:36.741961319Z | pinned_memory_pool_byte_size | 268435456 |
ERROR 2024-03-20T12:09:36.741962990Z | min_supported_compute_capability | 6.0 |
ERROR 2024-03-20T12:09:36.741964614Z | strict_readiness | 1 |
ERROR 2024-03-20T12:09:36.741966281Z | exit_timeout | 10 |
ERROR 2024-03-20T12:09:36.741967875Z | cache_enabled | 0 |
ERROR 2024-03-20T12:09:36.741969612Z +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
ERROR 2024-03-20T12:09:36.741971176Z {}
ERROR 2024-03-20T12:09:36.743208016Z 2024-03-20T12:09:36Z I 58 grpc_server.cc:2519] Started GRPCInferenceService at 0.0.0.0:8500
ERROR 2024-03-20T12:09:36.743344467Z 2024-03-20T12:09:36Z I 58 http_server.cc:4637] Started HTTPService at 0.0.0.0:8501
ERROR 2024-03-20T12:09:36.784829548Z 2024-03-20T12:09:36Z I 58 http_server.cc:320] Started Metrics Service at 0.0.0.0:8502
INFO 2024-03-20T12:09:43Z Pod has become Healthy in NEG "Key{\"k8s1-9d1dfa51-model-8500-da0f15cb\", zone: \"us-central1-a\"}" attached to BackendService "Key{\"model-backend\"}". Marking condition "cloud.google.com/load-balancer-neg-ready" to True.
WARNING 2024-03-20T13:08:18Z Liveness probe failed: HTTP probe failed with statuscode: 400
WARNING 2024-03-20T13:08:18Z Readiness probe failed: HTTP probe failed with statuscode: 400
ERROR 2024-03-20T13:08:18.675255782Z 2024-03-20T13:08:18Z I 58 model_lifecycle.cc:469] loading: combined_model_new:20240320015201
ERROR 2024-03-20T13:08:18.695962792Z assertion failed: prior > 0
ERROR 2024-03-20T13:08:18.695995386Z Signal (6) received.
ERROR 2024-03-20T13:08:18.958975409Z 0# 0x000055DD10D6A8AD in tritonserver
ERROR 2024-03-20T13:08:18.959008856Z 1# 0x00007FEB7632F520 in /lib/x86_64-linux-gnu/libc.so.6
ERROR 2024-03-20T13:08:18.959011944Z 2# pthread_kill in /lib/x86_64-linux-gnu/libc.so.6
ERROR 2024-03-20T13:08:18.959014247Z 3# raise in /lib/x86_64-linux-gnu/libc.so.6
ERROR 2024-03-20T13:08:18.959016201Z 4# abort in /lib/x86_64-linux-gnu/libc.so.6
ERROR 2024-03-20T13:08:18.959018018Z 5# 0x000055DD112D91BF in tritonserver
ERROR 2024-03-20T13:08:18.959019730Z 6# 0x000055DD10F1285D in tritonserver
ERROR 2024-03-20T13:08:18.959021838Z 7# 0x000055DD10DC04ED in tritonserver
ERROR 2024-03-20T13:08:18.959023769Z 8# 0x00007FEB765F2253 in /lib/x86_64-linux-gnu/libstdc++.so.6
ERROR 2024-03-20T13:08:18.959025614Z 9# 0x00007FEB76381AC3 in /lib/x86_64-linux-gnu/libc.so.6
ERROR 2024-03-20T13:08:18.959028011Z 10# 0x00007FEB76413850 in /lib/x86_64-linux-gnu/libc.so.6
ERROR 2024-03-20T13:08:18.959029710Z {}
ERROR 2024-03-20T13:08:20.800626718Z /start.sh: line 23: 58 Aborted (core dumped) LD_PRELOAD=/usr/lib/$(uname -m)-linux-gnu/libjemalloc.so:${LD_PRELOAD} tritonserver --model-repository=${MODEL_REPOSITORY} --log-format=ISO8601 --exit-on-error=true --strict-readiness=true --disable-auto-complete-config --model-load-thread-count=${MODEL_LOAD_THREAD_COUNT} --model-control-mode=POLL --repository-poll-secs=${POLL_INTERVAL_WITH_JITTER} --grpc-port=${GRPC_PORT} --http-port=${HTTP_PORT} --metrics-port=${METRICS_PORT} --metrics-config=summary_latencies=true --exit-timeout-secs=${EXIT_TIMEOUT_SECS}

The assertion failed: prior > 0 line comes from the sync.cc:105 file which is probably the file from gRPC.

Triton Information nvcr.io/nvidia/tritonserver:24.02-pyt-python-py3

To Reproduce Unfortunately, we don't know how to reproduce it.

The model config

```pbtxt name: "combined_model_new" platform: "pytorch_libtorch" input [ { name: "A" data_type: TYPE_INT64 dims: -1 }, { name: "B" data_type: TYPE_INT64 dims: -1 }, { name: "C" data_type: TYPE_INT64 dims: -1 }, { name: "D", data_type: TYPE_INT64 dims: -1 }, { name: "E", data_type: TYPE_INT64 dims: -1 }, { name: "F", data_type: TYPE_FP64 dims: -1 }, { name: "G" data_type: TYPE_STRING dims: -1 }, { name: "H" data_type: TYPE_STRING dims: -1 }, { name: "I" data_type: TYPE_STRING dims: -1 }, { name: "J" data_type: TYPE_STRING dims: -1 }, { name: "K" data_type: TYPE_STRING dims: -1 }, { name: "L" data_type: TYPE_STRING dims: -1 } ] output { name: "OUT1" data_type: TYPE_FP32 dims: -1 } output { name: "OUT2" data_type: TYPE_FP32 dims: -1 } dynamic_batching { } parameters { key: "INFERENCE_MODE" value { string_value: "true" } } instance_group [ { count: 5 kind: KIND_CPU } ] backend: "pytorch" model_warmup [{ name: "random sample" count: 100 batch_size: 1 inputs { key: "A" value: { data_type: TYPE_INT64 dims: [1] random_data: True } } inputs { key: "B" value: { data_type: TYPE_INT64 dims: [1] random_data: True } } inputs { key: "C" value: { data_type: TYPE_INT64 dims: [1] random_data: True } } inputs { key: "D" value: { data_type: TYPE_INT64 dims: [1] random_data: True } } inputs { key: "E" value: { data_type: TYPE_INT64 dims: [1] random_data: True } } inputs { key: "F" value: { data_type: TYPE_FP64 dims: [1] random_data: True } } inputs { key: "G" value: { data_type: TYPE_STRING dims: [1] random_data: True } } inputs { key: "H" value: { data_type: TYPE_STRING dims: [1] random_data: True } } inputs { key: "I" value: { data_type: TYPE_STRING dims: [1] random_data: True } } inputs { key: "J" value: { data_type: TYPE_STRING dims: [1] random_data: True } } inputs { key: "K" value: { data_type: TYPE_STRING dims: [1] random_data: True } } inputs { key: "L" value: { data_type: TYPE_STRING dims: [1] random_data: True } } }] ```

Expected behavior The models should be updated flawlessly without segfaults.

indrajit96 commented 6 months ago

Hi @yutkin, Thanks for reaching out. I have filed a bug https://jirasw.nvidia.com/browse/DLIS-6362 for the same.