start a triton server get error: Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))

Description

The log is like blow:
fauxpilot_8001-copilot_proxy-1  |
fauxpilot_8001-copilot_proxy-1  | =============================
fauxpilot_8001-copilot_proxy-1  | == Triton Inference Server ==
fauxpilot_8001-copilot_proxy-1  | =============================
fauxpilot_8001-copilot_proxy-1  |
fauxpilot_8001-copilot_proxy-1  | NVIDIA Release 23.04 (build 58408265)
fauxpilot_8001-copilot_proxy-1  | Triton Server Version 2.33.0
fauxpilot_8001-copilot_proxy-1  |
fauxpilot_8001-copilot_proxy-1  | Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
fauxpilot_8001-copilot_proxy-1  |
fauxpilot_8001-copilot_proxy-1  | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
fauxpilot_8001-copilot_proxy-1  |
fauxpilot_8001-copilot_proxy-1  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
fauxpilot_8001-copilot_proxy-1  | By pulling and using the container, you accept the terms and conditions of this license:
fauxpilot_8001-copilot_proxy-1  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
fauxpilot_8001-copilot_proxy-1  |
fauxpilot_8001-copilot_proxy-1  | WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
fauxpilot_8001-copilot_proxy-1  |    Use the NVIDIA Container Toolkit to start this container with GPU support; see
fauxpilot_8001-copilot_proxy-1  |    https://docs.nvidia.com/datacenter/cloud-native/ .
fauxpilot_8001-copilot_proxy-1  |
fauxpilot_8001-copilot_proxy-1  | INFO:     Started server process [1]
fauxpilot_8001-copilot_proxy-1  | INFO:     Waiting for application startup.
fauxpilot_8001-copilot_proxy-1  | INFO:     Application startup complete.
fauxpilot_8001-copilot_proxy-1  | INFO:     Uvicorn running on http://0.0.0.0:5002 (Press CTRL+C to quit)
fauxpilot_8001-triton-1         |
fauxpilot_8001-triton-1         | =============================
fauxpilot_8001-triton-1         | == Triton Inference Server ==
fauxpilot_8001-triton-1         | =============================
fauxpilot_8001-triton-1         |
fauxpilot_8001-triton-1         | NVIDIA Release 23.04 (build 58408265)
fauxpilot_8001-triton-1         | Triton Server Version 2.33.0
fauxpilot_8001-triton-1         |
fauxpilot_8001-triton-1         | Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
fauxpilot_8001-triton-1         |
fauxpilot_8001-triton-1         | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
fauxpilot_8001-triton-1         |
fauxpilot_8001-triton-1         | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
fauxpilot_8001-triton-1         | By pulling and using the container, you accept the terms and conditions of this license:
fauxpilot_8001-triton-1         | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
fauxpilot_8001-triton-1         |
fauxpilot_8001-triton-1         | NOTE: CUDA Forward Compatibility mode ENABLED.
fauxpilot_8001-triton-1         |   Using CUDA 12.1 driver version 530.30.02 with kernel driver version 470.161.03.
fauxpilot_8001-triton-1         |   See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
fauxpilot_8001-triton-1         |
fauxpilot_8001-triton-1         | I0718 11:28:27.118832 98 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f5ed8000000' with size 268435456
fauxpilot_8001-triton-1         | I0718 11:28:27.119447 98 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
fauxpilot_8001-triton-1         | I0718 11:28:27.127231 98 model_lifecycle.cc:459] loading: fastertransformer:1
fauxpilot_8001-triton-1         | I0718 11:28:27.251943 98 libfastertransformer.cc:1828] TRITONBACKEND_Initialize: fastertransformer
fauxpilot_8001-triton-1         | I0718 11:28:27.251967 98 libfastertransformer.cc:1838] Triton TRITONBACKEND API version: 1.12
fauxpilot_8001-triton-1         | I0718 11:28:27.251971 98 libfastertransformer.cc:1844] 'fastertransformer' TRITONBACKEND API version: 1.12
fauxpilot_8001-triton-1         | I0718 11:28:27.503568 98 libfastertransformer.cc:1876] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
fauxpilot_8001-triton-1         | I0718 11:28:27.504329 98 libfastertransformer.cc:372] Instance group type: KIND_CPU count: 1
fauxpilot_8001-triton-1         | I0718 11:28:27.504343 98 libfastertransformer.cc:402] Sequence Batching: disabled
fauxpilot_8001-triton-1         | I0718 11:28:27.504346 98 libfastertransformer.cc:412] Dynamic Batching: disabled
fauxpilot_8001-triton-1         | E0718 11:28:27.504357 98 libfastertransformer.cc:283] Invalid configuration argument 'data_type':
fauxpilot_8001-triton-1         | I0718 11:28:27.504361 98 libfastertransformer.cc:438] Before Loading Weights:
fauxpilot_8001-triton-1         | after allocation    : free: 14.93 GB, total: 22.20 GB, used:  7.27 GB
fauxpilot_8001-triton-1         | [fec673e0de69:98   :0:361] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
fauxpilot_8001-triton-1         | ==== backtrace (tid:    361) ====
fauxpilot_8001-triton-1         |  0 0x0000000000014420 __funlockfile()  ???:0
fauxpilot_8001-triton-1         |  1 0x00000000000371da std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (AbstractTransformerModel::*)(int, int), std::shared_ptr<AbstractTransformerModel>, int, int> > >::_M_run()  :0
fauxpilot_8001-triton-1         |  2 0x00000000000d6de4 std::error_code::default_error_condition()  ???:0
fauxpilot_8001-triton-1         |  3 0x0000000000008609 start_thread()  ???:0
fauxpilot_8001-triton-1         |  4 0x000000000011f133 clone()  ???:0
fauxpilot_8001-triton-1         | =================================
fauxpilot_8001-triton-1         | [fec673e0de69:00098] *** Process received signal ***
fauxpilot_8001-triton-1         | [fec673e0de69:00098] Signal: Segmentation fault (11)
fauxpilot_8001-triton-1         | [fec673e0de69:00098] Signal code:  (-6)
fauxpilot_8001-triton-1         | [fec673e0de69:00098] Failing at address: 0x62
fauxpilot_8001-triton-1         | [fec673e0de69:00098] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f5f25937420]
fauxpilot_8001-triton-1         | [fec673e0de69:00098] [ 1] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x371da)[0x7f5f1812a1da]
fauxpilot_8001-triton-1         | [fec673e0de69:00098] [ 2] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f5f25817de4]
fauxpilot_8001-triton-1         | [fec673e0de69:00098] [ 3] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f5f2592b609]
fauxpilot_8001-triton-1         | [fec673e0de69:00098] [ 4] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f5f25502133]
fauxpilot_8001-triton-1         | [fec673e0de69:00098] *** End of error message ***
fauxpilot_8001-triton-1         | --------------------------------------------------------------------------
fauxpilot_8001-triton-1         | Primary job  terminated normally, but 1 process returned
fauxpilot_8001-triton-1         | a non-zero exit code. Per user-direction, the job has been aborted.
fauxpilot_8001-triton-1         | --------------------------------------------------------------------------
fauxpilot_8001-triton-1         | --------------------------------------------------------------------------
fauxpilot_8001-triton-1         | mpirun noticed that process rank 0 with PID 0 on node fec673e0de69 exited on signal 11 (Segmentation fault).
fauxpilot_8001-triton-1         | --------------------------------------------------------------------------
fauxpilot_8001-triton-1 exited with code 139

command to start a server:
FT_LOG_LEVEL=DEBUG CUDA_VISIBLE_DEVICES=${GPUS} mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=/model

nvidia-smi shows:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10          On   | 00000000:00:08.0 Off |                    0 |
|  0%   49C    P0    61W / 150W |  21539MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A10          On   | 00000000:00:09.0 Off |                    0 |
|  0%   52C    P0    63W / 150W |   7881MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A10          On   | 00000000:00:0A.0 Off |                    0 |
|  0%   51C    P0    62W / 150W |  14281MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A10          On   | 00000000:00:0B.0 Off |                    0 |
|  0%   61C    P0    69W / 150W |   7043MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Reproduced Steps

Command to start a server:
FT_LOG_LEVEL=DEBUG CUDA_VISIBLE_DEVICES=${GPUS} mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=/model

Fastertransformer backend build with command:
python3 create_dockerfile_and_build.py --triton-version 23.04

triton-inference-server / fastertransformer_backend

start a triton server get error: Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) #156

Description

Reproduced Steps