vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.92k stars 4.12k forks source link

[Feature]: 我们可以让vllm支持tensorrt编译之后的engine吗 #5489

Open huai-ying opened 3 months ago

huai-ying commented 3 months ago

🚀 The feature, motivation and pitch

tensorrt加速确实很厉害,但是他的并发性没有vllm做的好

Alternatives

No response

Additional context

No response

QwertyJack commented 3 months ago

配合triton-inference-server如何?