ROCm: Server error: transport error when running batch size >=2 (Falcon-11B) - Githubissues

huggingface / text-generation-inference

Large Language Model Text Generation Inference

http://hf.co/docs/text-generation-inference

Apache License 2.0

8.34k stars 943 forks source link

ROCm: Server error: transport error when running batch size >=2 (Falcon-11B) #2043

Open almersawi opened 3 weeks ago

almersawi commented 3 weeks ago

System Info

image: text-generation-inference:sha-bf3c813-rocm GPU: AMD MI250 TGI args: --dtype float16 --model-id tiiuae/falcon-11B

PS. tested on meta-llama/Llama-2-7b-hf, no issues

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Call the chat completion endpoint:

curl 127.0.0.1:8081/v1/chat/completions \
-X POST \
-d '{"inputs":"hello","max_new_tokens": 500, "stream": true}' \
-H 'Content-Type: application/json'

Send another inference request while the first steam is still running
Error: 2024-06-08T09:34:21.826887Z ERROR batch{batch_size=2}:decode:decode{size=2}:decode{size=2}: text_generation_client: router/client/src/lib.rs:46: Server error: transport error

Expected behavior

TGI should handle different batch size on ROCm

almersawi commented 3 weeks ago

Same issue for tiiuae/falcon-7b-instruct