Send another inference request while the first steam is still running
Error:
2024-06-08T09:34:21.826887Z ERROR batch{batch_size=2}:decode:decode{size=2}:decode{size=2}: text_generation_client: router/client/src/lib.rs:46: Server error: transport error
System Info
image: text-generation-inference:sha-bf3c813-rocm GPU: AMD MI250 TGI args: --dtype float16 --model-id tiiuae/falcon-11B
PS. tested on meta-llama/Llama-2-7b-hf, no issues
Information
Tasks
Reproduction
2024-06-08T09:34:21.826887Z ERROR batch{batch_size=2}:decode:decode{size=2}:decode{size=2}: text_generation_client: router/client/src/lib.rs:46: Server error: transport error
Expected behavior
TGI should handle different batch size on ROCm