Triton Server crashed when using baichuan2-13B bf16 precision for inference

I'm trying to use Triton to deploy baichuan2-13B inference under bf16 precision. The tritonserver can be started successfully, but when processing client request, it crashed.

Use TensorRT-LLM v0.5.0 to build the engine

The following shows the configuration I used when building the engine python build.py --model_version v2_13b --model_dir /mnt/Baichuan2-13B-Chat/ --dtype bfloat16 --use_gemm_plugin bfloat16 --use_gpt_attention_plugin bfloat16 --output_dir /mnt/trt_engine/baichuan2-13B/1-gpu/ When send a curl request, the server crashed and got following error:

Hope to get some effective suggestions to solve this problem.

triton-inference-server / tensorrtllm_backend

Triton Server crashed when using baichuan2-13B bf16 precision for inference #198