Closed Oluwafemi-Jegede closed 2 days ago
What's in custom_llama.txt
?
What's in
custom_llama.txt
? @rick-github
FROM llama3:8b
PARAMETER temperature 0.8 PARAMETER top_k 30 PARAMETER top_p 0.7
PARAMETER stop <|start_header_id|> PARAMETER stop <|end_header_id|> PARAMETER stop <|eot_id|> PARAMETER stop <|reserved_special_token
TEMPLATE """ {{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|> """
SYSTEM You are a bot that helps infer ........
Where is the llama3:8b
model located?
I am unsure if I understand what you mean, but shouldn't this FROM ollama/ollama:latest
in the docker file already resolve that?
FROM ollama/ollama:latest
just pulls the program, not any models. If you want to create a new model, you need to pull the model you want to base your custom one on: ollama pull llama3:8b
.
@rick-github Okay thanks so the docker file should look like this?
FROM ollama/ollama:latest
COPY custom_llama.txt /App/custom_llama.txt
WORKDIR /App
RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt && ollama run ai-agent
EXPOSE 11434
Also curious how it runs locally without running the ollama application in the background
The RUN commands you have there are only running during the container build process, the container automatically starts the ollama server when it's instantiated, so when running locally it's just ready. The final ollama run ai-agent
is unnecessary.
Note that the way you are doing this, every time you build the container, ollama will re-pull the model, which can be slow, error prone, and impactful on your bandwidth budget. It may be better to pull the model to your work space just once, and then COPY the model in to the container during the build process.
@rick-github Yeah, thanks for the suggestion will try COPY to reduce overhead, I tried Dockerfile below and I still can not see any model on cloud run after adding the pull command
FROM ollama/ollama:latest
COPY custom_llama.txt /App/custom_llama.txt
WORKDIR /App
RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt
EXPOSE 11434
$URL/api/tags => {"models":[]}
Worked locally for me. I don't have a GCP account so can't test cloud run. Do you get any logs from the GCP attempt?
Build:
$ docker build -f Dockerfile -t 6702 --progress plain .
...
#8 0.180 2024/09/08 18:12:46 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
...
pulling manifest
#8 81.61 pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
...
#8 81.61 success
...
#8 81.66 transferring model data
#8 81.66 using existing layer sha256:6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa
...
#8 81.66 success
...
#9 exporting layers 14.9s done
#9 writing image sha256:9c602d2c645c0ced9f6010250c0f7876d771c5634e799cfdcc6c335ed55fc4d6 done
#9 naming to docker.io/library/6702 done
#9 DONE 14.9s
Run:
$ docker run -d --name 6702 6702
4faf0f6f995e88003c274200f743a50615f4146f15e0965cdbed306e89f3c04a
$ docker exec -it 6702 bash
root@4faf0f6f995e:/App# ollama list
NAME ID SIZE MODIFIED
ai-agent:latest 3f2762d3ecf4 4.7 GB 7 minutes ago
llama3:8b 365c0bd3c000 4.7 GB 7 minutes ago
root@4faf0f6f995e:/App# ollama run ai-agent:latest hello
Hello! I'm a bot designed to help infer information from text-based input. I can assist with tasks such as answering questions, summarizing content, and generating ideas. What would you like to talk about
or ask?
root@4faf0f6f995e:/App#
Yeah same here, works perfectly locally for me but when I move to cloud it just showsollama is running
Are you running it in a VM instance in the cloud, or just the container with gcloud compute instances create-with-container
?
so I am using Google Cloud Run more like a managed container to run workloads in the cloud, with no direct access to the VMs or Compute Engine instances.
Not Ideal or my plan, but I created the model with two API request
$URL/api/pull => to pull llama3:8b $URL/api/create (with the content of the model file ) => to create the bot model
However, it will be nice to just run the container image, which contains all the config, and have it ready to serve
It's because the cloud built container has OLLAMA_MODELS=/home/.ollama/models
while the locally built container uses OLLAMA_MODELS=/root/.ollama/models
. Not sure why, I assume the build or run process in GCP sets some environment variables (maybe HOME
) that results in a different path for ollama state. I don't know enough about GCP to fix this the right way, but a workaround is to set HOME in the Dockerfile:
--- Dockerfile.orig 2024-09-08 23:42:50.799039526 +0200
+++ Dockerfile 2024-09-08 23:34:44.897002700 +0200
@@ -5,6 +5,6 @@
WORKDIR /App
-RUN ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt
+RUN HOME=/home ollama serve & sleep 5 && ollama pull llama3:8b && ollama create ai-agent -f custom_llama.txt
EXPOSE 11434
Build and deploy and when the container starts it will see the models:
$ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama list
NAME ID SIZE MODIFIED
ai-agent:latest 3f2762d3ecf4 4.7 GB 18 minutes ago
llama3:8b 365c0bd3c000 4.7 GB 18 minutes ago
$ OLLAMA_HOST=https://test-123412341234.us-west1.run.app:443 ollama run ai-agent hello
Hello! I'm a bot that helps infer the meaning of text. You can provide me with some text, and I'll do my best to understand its meaning and provide you with relevant information or insights.
What would you like to talk about? Do you have any specific topics in mind, or would you like me to suggest some prompts to get us started?
Setting HOME fixed the issue. Will be interesting to know if the same issue is observed with similar products from other cloud platforms Azure, AWS, or its just GCP
Thanks @rick-github
What is the issue?
I can run a custom LLAMA3 model locally using this docker config
However when I deploy on GCP cloud run, I don't see any model running.
$URL/api/tags = {"models":[]}
, but it saysollama running
on the homepageFYI: Custom model is LLAMA3:8B
OS
Docker
GPU
No response
CPU
No response
Ollama version
LLAMA3