triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
581 stars 81 forks source link

Seg fault after loaded models in official example #425

Open LeatherDeerAU opened 2 months ago

LeatherDeerAU commented 2 months ago

System Info

arch - x86-64 gpu - rtx3070 docker image nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3 tensorRT-LLM-backend tag - 0.7.2 tensorRT-LLM tag - 0.7.1 (80bc07510ac4ddf13c0d76ad295cdb2b75614618)

Who can help?

@juney-nvidia

Information

Tasks

Reproduction

  1. use nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3 image
  2. clone tensorRT-LLM backend 0.7.2 tag
  3. build tensorRT*.whl in container from tensorRT-LLM submodule in backend repo
  4. try launch models from official example

Expected behavior

models uploaded successfully

actual behavior

sig_fault_logs.txt

additional notes

Which backend tag should I use for triton-container version 24-01?

related topics (?): https://github.com/triton-inference-server/tensorrtllm_backend/issues/273 https://github.com/NVIDIA/TensorRT-LLM/issues/782 https://github.com/triton-inference-server/tensorrtllm_backend/issues/88

LeatherDeerAU commented 2 months ago

Example models config.pbtxt files: postproccesing.txt preprocessing.txt tensorrt-llm.txt

byshiue commented 2 months ago

Could you share all scripts you use?