[Feature]: 我们可以让vllm支持tensorrt编译之后的engine吗

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

27.92k stars 4.12k forks source link

Open huai-ying opened 3 months ago

huai-ying commented 3 months ago

tensorrt加速确实很厉害，但是他的并发性没有vllm做的好

No response

No response

QwertyJack commented 3 months ago

配合triton-inference-server如何？