Open yuqie opened 3 weeks ago
Thanks for the report @yuqie!
cc @fxmarty as the author of the benchmark
Hi @yuqie, thank you. What happens after launching
text-generation-benchmark --tokenizer-name meta-llama/Meta-Llama-3-70B-Instruct \
--sequence-length 2048 --decode-length 128 --warmups 2 --runs 10 \
-b 1 -b 2
in the second terminal within the container?
You should have a graphic benchmark like https://youtu.be/jlMAX2Oaht0?t=198 at this point.
System Info
Target: x86_64-unknown-linux-gnu Cargo version: 1.78.0 Commit sha: 96b7b40ca3e39f7ca5b875bff9a4665c1b175289 Docker label: sha-96b7b40-rocm
Information
Tasks
Reproduction
I followed the step from website https://github.com/huggingface/hf-rocm-benchmark
docker exec -it tgi_container_name /bin/bash
and it stucked after the following log
I also tried llama2-7b with a single GPU card with sequence-length of 512 and decode-length of 128, but stucked too.
Expected behavior
Prefill and decode latency is expected but it gets stacked and output nothing in nearly one hour. Besides, the GPU usibility is zero, which is non-zero when setup the warmup steps