lllyasviel / Omost

Your image is almost there!
Apache License 2.0
7.32k stars 418 forks source link

Using vLLM to deploy LLM as an API to accelerate inference #100

Open fx-hit opened 4 months ago

fx-hit commented 4 months ago

Based on practical tests, deploying omost-llama-3-8b on an A100 using torch==2.3.0+cu118, vllm==0.5.0.post1+cu118, and xformers==0.0.26.post1+cu118 works well. if want to speed up the process, can refer to this setup.

vllm: https://docs.vllm.ai/en/stable/getting_started/quickstart.html

badcookie78 commented 4 months ago

Hi, Can I know if is possible to run in with ollama and then host the LLM locally?

zk19971101 commented 3 months ago

I find comfyui_omost show a way to accelerate inference by TGI(text generation inference). https://github.com/huchenlei/ComfyUI_omost?tab=readme-ov-file#accelerating-llm

sudanl commented 2 months ago

Based on practical tests, deploying omost-llama-3-8b on an A100 using torch==2.3.0+cu118, vllm==0.5.0.post1+cu118, and xformers==0.0.26.post1+cu118 works well. if want to speed up the process, can refer to this setup.

vllm: https://docs.vllm.ai/en/stable/getting_started/quickstart.html

Good idea! Could you kindly share the code?