Open fx-hit opened 1 week ago
Based on practical tests, deploying omost-llama-3-8b on an A100 using torch==2.3.0+cu118, vllm==0.5.0.post1+cu118, and xformers==0.0.26.post1+cu118 works well. if want to speed up the process, can refer to this setup.
vllm: https://docs.vllm.ai/en/stable/getting_started/quickstart.html
Hi, Can I know if is possible to run in with ollama and then host the LLM locally?
Based on practical tests, deploying omost-llama-3-8b on an A100 using torch==2.3.0+cu118, vllm==0.5.0.post1+cu118, and xformers==0.0.26.post1+cu118 works well. if want to speed up the process, can refer to this setup.
vllm: https://docs.vllm.ai/en/stable/getting_started/quickstart.html