pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.35k stars 484 forks source link

Can GPT-Fast support larger batch sizes #90

Closed yetingqiaqia closed 4 months ago

yetingqiaqia commented 5 months ago

Hi, I tried this one on AMD MI250. It runs well, with similar throughput reported in the wiki. However, gpt-fast seems to only support batch_size=1, which limits the throughput performance. I wonder whether gpt-fast supports larger batch size? If no, is there any plan to support it in the near future? Thanks.

pccq2002 commented 5 months ago

I have the same problem, thanks

yanboliang commented 5 months ago

We will be focusing on optimizing latency (i.e. batch size=1) for this project.

yetingqiaqia commented 4 months ago

Thanks @yanboliang . Then for offline scenarios of large batch, I will use frameworks like vLLM, TensorRT-LLM, or DeepSpeed-fastGen.