vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.61k stars 3.9k forks source link

[Feature]: Compatibility issues #3555

Open BlackHandsomeLee opened 5 months ago

BlackHandsomeLee commented 5 months ago

🚀 The feature, motivation and pitch

Can the vllm acceleration framework be compatible with tensorRT-LLM? here is doc about tensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM

Alternatives

No response

Additional context

No response

simon-mo commented 5 months ago

Can you elaborate about which aspect of the compatibility you are interested in? The API/distribution/kernels?

BlackHandsomeLee commented 5 months ago

Can you elaborate about which aspect of the compatibility you are interested in? The API/distribution/kernels?

can vllm offline inference be compatible with run local execution tensorRT-LLM engine?

simon-mo commented 5 months ago

Do you mean outputting the same? vLLM offline inference can run models end to end with performance on-par with TensorRT-LLM.