lllyasviel / Omost

Your image is almost there!
Apache License 2.0
6.69k stars 401 forks source link

Using vLLM to deploy LLM as an API to accelerate inference #100

Open fx-hit opened 1 week ago

fx-hit commented 1 week ago

Based on practical tests, deploying omost-llama-3-8b on an A100 using torch==2.3.0+cu118, vllm==0.5.0.post1+cu118, and xformers==0.0.26.post1+cu118 works well. if want to speed up the process, can refer to this setup.

vllm: https://docs.vllm.ai/en/stable/getting_started/quickstart.html

badcookie78 commented 1 day ago

Hi, Can I know if is possible to run in with ollama and then host the LLM locally?