Open BlackHandsomeLee opened 5 months ago
Can you elaborate about which aspect of the compatibility you are interested in? The API/distribution/kernels?
Can you elaborate about which aspect of the compatibility you are interested in? The API/distribution/kernels?
can vllm offline inference be compatible with run local execution tensorRT-LLM engine?
Do you mean outputting the same? vLLM offline inference can run models end to end with performance on-par with TensorRT-LLM.
🚀 The feature, motivation and pitch
Can the vllm acceleration framework be compatible with tensorRT-LLM? here is doc about tensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM
Alternatives
No response
Additional context
No response