Closed yetingqiaqia closed 4 months ago
I have the same problem, thanks
We will be focusing on optimizing latency (i.e. batch size=1) for this project.
Thanks @yanboliang . Then for offline scenarios of large batch, I will use frameworks like vLLM, TensorRT-LLM, or DeepSpeed-fastGen.
Hi, I tried this one on AMD MI250. It runs well, with similar throughput reported in the wiki. However, gpt-fast seems to only support batch_size=1, which limits the throughput performance. I wonder whether gpt-fast supports larger batch size? If no, is there any plan to support it in the near future? Thanks.