I use --model-control-mode explicit to start the trtion server. After I load the tensorrt model into 2 instances, while sending requests to the model, I modify the number of instances in the configuration file to 1, using curl -X POST http://localhost:8000/v2/repository/models/model/load to reload the model, this error occurred.
i use nvcr.io/nvidia/tritonserver:24.07-py3
The following is the error message:
I1021 10:01:34.487165 188 model_lifecycle.cc:472] "loading: resnet50_trt_fp16:1"
I1021 10:01:34.488171 188 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt"
I1021 10:01:34.488182 188 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19"
I1021 10:01:34.488186 188 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19"
I1021 10:01:34.488190 188 tensorrt.cc:105] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I1021 10:01:34.488530 188 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: resnet50_trt_fp16 (version 1)"
I1021 10:01:34.529312 188 logging.cc:46] "Loaded engine size: 51 MiB"
W1021 10:01:34.555552 188 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors."
I1021 10:01:34.603613 188 tensorrt.cc:297] "TRITONBACKEND_ModelInstanceInitialize: resnet50_trt_fp16_0_0 (GPU device 0)"
I1021 10:01:34.633075 188 logging.cc:46] "Loaded engine size: 51 MiB"
W1021 10:01:34.633101 188 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors."
I1021 10:01:34.641406 188 logging.cc:46] "[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +245, now: CPU 0, GPU 293 (MiB)"
I1021 10:01:34.641539 188 instance_state.cc:186] "Created instance resnet50_trt_fp16_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];"
I1021 10:01:34.641639 188 model_lifecycle.cc:838] "successfully loaded 'resnet50_trt_fp16'"
I1021 10:02:12.749887 188 model_lifecycle.cc:472] "loading: resnet50_trt_fp16:1"
I1021 10:02:12.754198 188 tensorrt.cc:297] "TRITONBACKEND_ModelInstanceInitialize: resnet50_trt_fp16_0_0 (GPU device 0)"
I1021 10:02:12.784549 188 logging.cc:46] "Loaded engine size: 51 MiB"
W1021 10:02:12.784573 188 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors."
I1021 10:02:12.792749 188 logging.cc:46] "[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +245, now: CPU 1, GPU 587 (MiB)"
I1021 10:02:12.792843 188 instance_state.cc:186] "Created instance resnet50_trt_fp16_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];"
I1021 10:02:12.794225 188 tensorrt.cc:297] "TRITONBACKEND_ModelInstanceInitialize: resnet50_trt_fp16_0_1 (GPU device 0)"
I1021 10:02:12.822101 188 logging.cc:46] "Loaded engine size: 51 MiB"
W1021 10:02:12.822118 188 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors."
I1021 10:02:12.830483 188 logging.cc:46] "[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +245, now: CPU 1, GPU 881 (MiB)"
I1021 10:02:12.830603 188 instance_state.cc:186] "Created instance resnet50_trt_fp16_0_1 on GPU 0 with stream priority 0 and optimization profile default[0];"
I1021 10:02:12.831002 188 tensorrt.cc:353] "TRITONBACKEND_ModelInstanceFinalize: delete instance state"
I1021 10:02:12.832710 188 model_lifecycle.cc:838] "successfully loaded 'resnet50_trt_fp16'"
[zhouhaojie-System-Product-Name:188 :0:356] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1c0)
I use --model-control-mode explicit to start the trtion server. After I load the tensorrt model into 2 instances, while sending requests to the model, I modify the number of instances in the configuration file to 1, using curl -X POST http://localhost:8000/v2/repository/models/model/load to reload the model, this error occurred.
i use
nvcr.io/nvidia/tritonserver:24.07-py3
The following is the error message:
Segmentation fault (core dumped)