triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
711 stars 108 forks source link

the output of bls is unstable #630

Open dwq370 opened 1 month ago

dwq370 commented 1 month ago

System Info

Ubuntu 22.04 Triton image: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 and the version of trtllm-backend is 0.10.0 Model: qwen2-7b-instruct

Who can help?

No response

Information

Tasks

Reproduction

launch a qwen2-7b-instruct in a container with image nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3, and then test ensemble and bls model.

1 ensemble model test 1.1 there are no characters between two sentences in input text image 1.2 there are some sapce or '\n' between two sentences in input text, the semantics of input text is not changing image image

2 bls model test 2.1 there are no characters between two sentences in input text image 2.2 there are some sapce or '\n' between two sentences in input text, the semantics of input text is not changing image image

Expected behavior

the result of 1.1 and 1.2 is same the result of 2.1 and 2.2 is same

actual behavior

the result of 2.1 and 2.2 is not same

additional notes

no