Open MuYu-zhi opened 5 months ago
Hi,
We did not explicitly compare with vLLM because we believe its performance is worse than TRT-LLM-FP16 (which implements the same paged attention functionality but with a faster attention kernel). Our throughput is much better than TRT-LLM-FP16.
Best, Haotian
as title