pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.58k stars 509 forks source link

What's the input context length for the benchmark results? #26

Closed YangZhou0417 closed 9 months ago

YangZhou0417 commented 10 months ago

With longer input length, the prefill phase latency would be higher, could you share the model input token count when obtaining the results in this post?

https://pytorch.org/blog/accelerating-generative-ai-2/?utm_content=273712248&utm_medium=social&utm_source=twitter&hss_channel=tw-776585502606721024

Chillee commented 10 months ago

Low, maybe 5 tokens?

YangZhou0417 commented 9 months ago

I see, it would be nice to benchmark on larger context length as the first token latency can increase significantly.