vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
23.29k stars 3.31k forks source link

CTranslate2 #211

Open Matthieu-Tinycoaching opened 1 year ago

Matthieu-Tinycoaching commented 1 year ago

Hello,

Thanks for the great framework for deploying LLM.

Would it be possible to use a LLM model compiled with the CTranslate2 library?

zhuohan123 commented 1 year ago

Thanks for bringing this up. We will investigate the CTranslate2 library and evaluate the difficulty and the potential benefit of adding it into vLLM.

anujnayyar1 commented 1 year ago

Would love to see this, ct2 would be a great integration! It would give us easy access to fast 8 bit inference and plays nice with HF Transformers. Thank you for the library so far!!

Matthieu-Tinycoaching commented 11 months ago

Hi,

Any news regarding this integration? Ctranslate2 has already proven its speed within the TitanML framework for local LLM serving.

manishiitg commented 9 months ago

hi,

any news on this?

Matthieu-Tinycoaching commented 9 months ago

+1

shixianc commented 9 months ago

+11

hmellor commented 2 months ago

@zhuohan123 do you see any benefit of adding this to vLLM?