Open alexhegit opened 3 weeks ago
Hello @alexhegit , I used to have the same problem, I'm not sure if it's supposed to work that way. but whenever I run the model using ollama first for few minutes and then running it using llama stack distribution it works for me. Try to run the model using ollama first as a warm up, and then start your server. Let me know if this helps!
Hello @alexhegit , I used to have the same problem, I'm not sure if it's supposed to work that way. but whenever I run the model using ollama first for few minutes and then running it using llama stack distribution it works for me. Try to run the model using ollama first as a warm up, and then start your server. Let me know if this helps!
Hi @HabebNawatha YES. The Way2_StepbyStep, I did ollama run with LLM “ollama run llama3.1:8b-instruct-fp16 ” as warmup before using distribution/ollama serving. I confirm by ollama serving is ready. BTW: what GPU you used, NVIDIA or AMD GPU. The Way2 work fine with Nvidia GPU from my side. But I try to enable AMD ROCm GPU with issue.
@alexhegit Hey! I'm actually using a Macbook Pro M2 Chip, running it locally. And every time i warm up my model before using a client script it works. But still I think it should not work that way, it should be ready whenever the client script is called.
🚀 The feature, motivation and pitch
Ollama has the docker image ollama/ollama:rocm support AMD ROCm . I wish distributuion/ollama could support AMD ROCm as https://github.com/meta-llama/llama-stack/tree/main/distributions/ollama/gpu for NVIDIA GPU.
I have a patch from my fork repo but does not pass the test.
Please guide me to fix it and make the PR be merged soon.
Alternatives
No response
Additional context
Here is the details about the patch and test.
Way 1: Using docker compose
Step 1: Create compose.yaml for rocm
Here is the patch https://github.com/alexhegit/llama-stack-rocm/commit/37f2b07c5102351c671b6ae0e8cd85ab4853e661
I create the rocm/compose.yaml use ollama/ollama:rocm to replace the ollama/ollama by refering the https://github.com/meta-llama/llama-stack/blob/main/distributions/ollama/gpu/compose.yaml .
And I reuse the https://github.com/meta-llama/llama-stack/blob/main/distributions/ollama/gpu/run.yaml as rocm/run.yaml
Run and Test
Step 1: docker compose up ollama/ollama:rocm with llamastack/distirbutuion-ollama
Step 2: verify the ollama server
It works fine.
Step 3: client test (failed)
Step 4: Check the containers
Way 2: step by step
Step 1: Start the LLM inference server
It use https://github.com/meta-llama/llama-stack/blob/main/distributions/ollama/gpu/run.yaml to mount to the container of llamastack/distribution-ollama
Step 2: Run the client test
Failed with the log of client test:
Log of llamastack/distributuion-ollama