Closed cbrousseauAumni closed 2 weeks ago
No GPU, CPU only
Further addition: When trying to install an update to ollama, the linux amd64 version is downloaded instead of the linux x86_64 with the install script.
It's not clear from your summary what the actual contents of your entrypoint is, if it's literally what you have there I would expect several errors.
If you are running standalone docker, try:
$ docker run --rm -d --entrypoint bash --name ollama-pull ollama/ollama -c '(sleep 2 ; ollama pull qwen:0.5b) & exec ollama serve'
f16f5557aeff86f0089ba256490d0cb60656638fe5e6560e1424e3d87c5f94e4
$ docker exec -it ollama-pull ollama list
NAME ID SIZE MODIFIED
qwen:0.5b b5dc5e784f2a 394 MB 1 second ago
$ docker exec -it ollama-pull ollama run qwen:0.5b hello
Hi! How can I help you today?
If you are using docker compose:
services:
ollama:
image: ollama/ollama
entrypoint: bash -c '(sleep 2 ; ollama pull ${MODEL-mistral}) & exec ollama serve'
amd64 is the same as x86_64.
I'm actually not running the ollama docker image anywhere, it's a chainguard image.
# install and run ollama
RUN curl -fsSL https://ollama.com/install.sh | sh
ENV GIN_MODE="release"
ENV OLLAMA_HOST=http://0.0.0.0:11434
RUN ollama serve & sleep 5 && ollama pull mistral
This runs fine
#!/bin/sh
ollama serve & sleep 5
ollama run mistral ""& /set nohistory & /set quiet &
This seems to run fine, but then after the entrypoint executes there's the issues mentioned above.
When you build the image, ollama is running as root and the models are stored in /root/.ollama. I'm not familiar with the format of the output of ps in your summary, but if I'm interpreting '{ollama}' correctly, you are running the server as user ollama. That doesn't really match the contents of your entrypoint file - some of the other stuff from the the Dockerfile might shed light on how you are starting the server. In any case, if it's running as ollama, the models will be stored in /usr/share/ollama. You can override this with OLLAMA_MODELS
.
Note that your entrypoint file has problems. /set nohistory
and /set quiet
are not consumed by the ollama run
command, the shell will try to execute the command /set
with arguments nohistory
and quiet
. If the entrypoint file is the sole command executed when the container starts (ie, not part of some run system) then the container will terminate when the ollama run
command completes.
Great, I think this conversation may have helped me see how to solve several of the problems. So I can set OLLAMA_MODELS to the .ollama/models directory where mistral is being pulled during build. Is there a way of making sure nohistory and quiet are in place, either through env vars or someplace else? This is a read-only file system, otherwise it wouldn't be important.
I'm not sure why you want to set nohistory
and quiet
, since stdin of ollama run
won't read from a terminal because you gave it an argument. You can use OLLAMA_NOHISTORY=1
to disable interactive history, and quiet
is the default so you don't need to set it.
Ok, I've modified the entry_script.sh to be this:
#!/bin/sh
ollama serve & sleep 5
ollama run mistral "" &
Are there any other changes that need to happen here?
As far as the docker image, it's a chainguard image, and these are the Ollama env, but nothing else related to ollama is in there:
RUN mkdir '/tmp/ollama'
ENV OLLAMA_HISTORY="/tmp/ollama"
ENV OLLAMA_TMPDIR="/tmp/ollama"
ENV OLLAMA_ORIGINS=*
ENV OLLAMA_MODELS=${FUNCTION_DIR}/.ollama/models/
I really just need to make sure that ollama run mistral is running when we get to the POST requests, so I don't get connection refused errors.
ollama run mistral
is not really required, ollama will load the model when the first request is received, although you will save a couple of seconds of response time for that first query. You also probably want to set OLLAMA_KEEP_ALIVE=-1
to stop the model from being unloaded when it's idle, and OLLAMA_NUM_PARALLEL=1
to save some VRAM if the model is not expected to run multiple concurrent completions.
Again, if the entrypoint is the sole command executed at container start, the container will terminate when the command finishes execution.
#!/bin/sh
(sleep 2 ; ollama run mistral "") &
exec ollama serve
Beautiful, thank you for helping me understand it better. I'll test this now.
thank you @rick-github nice detailed explainations
Unfortunately, when Ollama gets its first request, I get this error, this is the same one I've been getting: litellm.exceptions.APIConnectionError: litellm.APIConnectionError: OllamaException - {"error":"model \"mistral\" not found, try pulling it first"}
If you provide the full Dockerfile and dependencies, it will aid in debugging.
When the server runs, it outputs this:
Couldn't find '~/.ollama/id_ed25519'. Generating new private key.
But this is the output of ls -a in ~/.ollama:
ls -a ~/.ollama/
. .. id_ed25519 id_ed25519.pub models
Are we sure that overriding with OLLAMA_MODELS is working, or is there another environment variable I'm missing?
Here's the dockerfile:
ARG FUNCTION_DIR='/function'
# Used to include pip packages only since the prod image does not include pip
FROM chainguard-image AS build-image
ARG FUNCTION_DIR
USER root
RUN mkdir -p ${FUNCTION_DIR}
# make a virtual environemnt in function dir
RUN python -m venv ${FUNCTION_DIR}/venv
COPY requirements.txt ${FUNCTION_DIR}/requirements.txt
RUN ${FUNCTION_DIR}/venv/bin/pip install --no-cache-dir awslambdaric
# Install dev dependencies
RUN apk update && apk add --no-cache git wget cmake zip
# install python deps
RUN ${FUNCTION_DIR}/venv/bin/pip install --no-cache-dir wheel
RUN ${FUNCTION_DIR}/venv/bin/pip install --no-cache-dir -r ${FUNCTION_DIR}/requirements.txt
FROM company-image
USER root
WORKDIR ${FUNCTION_DIR}
#Install any needed dependencies into final image
RUN apk update && apk add --no-cache git git-lfs wget libgcc
RUN git lfs install
# the code
COPY config ${FUNCTION_DIR}/config
COPY app ${FUNCTION_DIR}/app
COPY tests ${FUNCTION_DIR}/tests
COPY --chmod=755 entry_script.sh ${FUNCTION_DIR}
# copy over ollama deps
RUN mkdir '/tmp/ollama'
ENV OLLAMA_HISTORY="/tmp/ollama"
ENV OLLAMA_TMPDIR="/tmp/ollama"
ENV OLLAMA_ORIGINS=*
ENV OLLAMA_MODELS=${FUNCTION_DIR}/.ollama/models
COPY ./ollama_install.sh ${FUNCTION_DIR}
RUN chmod +x ${FUNCTION_DIR}/ollama_install.sh
RUN ${FUNCTION_DIR}/ollama_install.sh
ENV GIN_MODE="release"
ENV OLLAMA_HOST=http://0.0.0.0:11434
RUN ollama serve & sleep 5 && ollama pull mistral
#COPY models /usr/share/ollama/.ollama/models
# copy over Chroma deps
RUN git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 ${FUNCTION_DIR}/MiniLM
RUN rm -rf ${FUNCTION_DIR}/MiniLM/.git
# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
# set env vars
ENV HOME=${FUNCTION_DIR}
#ENV OLLAMA_KEEP_ALIVE=-1
ENV ANONYMIZED_TELEMETRY=False
# set up a runtime interface client as default command for the container runtime
ENTRYPOINT ["./entry_script.sh"]
# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "app.function.handler" ]
Contents of ollama_install.sh
, entry_script.sh
? What's the base for company-image
?
ollama_install.sh is the vanilla install/upgrade script from here: https://ollama.com/install.sh entry_script.sh is up above, company-image base is wolfi
One more piece of info: When I exec into the docker container and check whether ollama run mistral
works, I get this:
I don't know whether the model is actually completely downloading during the docker build process when ollama pull mistral
is being run.
I substituted company-image
with cgr.dev/chainguard/wolfi-base
and commented out the the stuff related to building the app. entry_script.sh
was changed to:
#!/bin/sh
(sleep 2 ; ollama run mistral "") &
ollama serve &
exec $@
$ docker build -f Dockerfile -t 7301 .
# since I don't have an app, I pass a `sleep` command to the container so that it won't error out
$ docker run --rm -d -p 11430:11434 --name 7301 7301 sleep 300
$ docker exec 7301 ollama list
NAME ID SIZE MODIFIED
mistral:latest f974a74358d6 4.1 GB 26 minutes ago
$ curl -s localhost:11430/api/tags | jq
{
"models": [
{
"name": "mistral:latest",
"model": "mistral:latest",
"modified_at": "2024-10-22T20:36:53.105958546Z",
"size": 4113301824,
"digest": "f974a74358d62a017b37c6f424fcdf2744ca02926c4f952513ddf474b2fa5091",
"details": {
"parent_model": "",
"format": "gguf",
"family": "llama",
"families": [
"llama"
],
"parameter_size": "7.2B",
"quantization_level": "Q4_0"
}
}
]
}
$ curl -s localhost:11430/api/generate -d '{"model":"mistral","prompt":"hello","stream":false}' | jq -r .response
Hello! How can I help you today? 😊
If you have any questions or need assistance with something, feel free to ask! I'm here to help. If you just want to chat about a specific topic or share your thoughts, that's cool too. Let me know what you need!
Note that you set FUNCTION_DIR
at the top of the Dockerfile but not after the FROM
statements, which reset variables. So the models are being stored in /.ollama/models
.
$ docker exec 7301 env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=887108e16a9c
SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
OLLAMA_HISTORY=/tmp/ollama
OLLAMA_TMPDIR=/tmp/ollama
OLLAMA_ORIGINS=*
OLLAMA_MODELS=/.ollama/models
GIN_MODE=release
OLLAMA_HOST=http://0.0.0.0:11434
HOME=/root
ANONYMIZED_TELEMETRY=False
I think I'm seeing where the problem is, but I don't know how to solve it. I changed the entry_script.sh to be the same as above.
$ docker build --platform linux/x86_64 -t docker-image --build-arg GITHUB_TOKEN=github_token .
$ docker run --platform linux/x86_64 --name docker-image -dit docker-image sleep infinity
$ docker exec -it data-extractor ollama list
Error: could not connect to ollama app, is it running?
What's next:
Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug data-extractor
Learn more at https://docs.docker.com/go/debug-cli/
$ docker exec -it docker-image env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=6e8481d8e66c
TERM=xterm
SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
OLLAMA_HISTORY=/tmp/ollama
OLLAMA_TMPDIR=/tmp/ollama
OLLAMA_ORIGINS=*
OLLAMA_MODELS=/function/.ollama/models
GIN_MODE=release
OLLAMA_HOST=http://0.0.0.0:11434
HOME=/function
OLLAMA_KEEP_ALIVE=-1
ANONYMIZED_TELEMETRY=False
Are my docker commands doing something unintended? Or is the entry_script not being called for some reason?
What do you have OLLAMA_HOST
set to in the data-extractor
container? What's the result of docker exec -it data-extractor env
?
It worked! But there's an issue that I was hoping to avoid talking about, which is a runtime interface emulator & client.
These are getting in each others' way in the entry_script, where only one of them works at a time rn. Could you help me modify the entry_script so that both of these work at once?
Current script:
#!/bin/sh
cd /function
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
cd /function && exec /usr/local/bin/aws-lambda-rie /function/venv/bin/python -m awslambdaric $1
else
cd /function && exec /function/venv/bin/python -m awslambdaric $1
fi
(sleep 2 ; ollama run mistral "") &
ollama serve &
exec $@
#!/bin/sh
ollama serve &
sleep 2
ollama run mistral ""
cd /function
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
exec /usr/local/bin/aws-lambda-rie /function/venv/bin/python -m awslambdaric $1
fi
exec /function/venv/bin/python -m awslambdaric $1
Note that if the python script exits, the container will also exit.
If you want the python script to be able to exit without killing the container:
#!/bin/sh
(
sleep 2
ollama run mistral ""
cd /function
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
exec /usr/local/bin/aws-lambda-rie /function/venv/bin/python -m awslambdaric $1
fi
exec /function/venv/bin/python -m awslambdaric $1
) &
exec ollama serve
Beautiful, everything is fixed. Thank you @rick-github, you've really helped here above and beyond.
What is the issue?
Running
ollama run mistral ""& /set nohistory & /set quiet &
in a bash script as the entrypoint to a docker container, in an attempt to start mistral right after starting the ollama server and pulling mistralOutput of
ps aux | grep ollama | grep -v grep
is0
These are the outputs of running subprocesses:7 root 0:31 {ollama} /run/rosetta/rosetta /usr/bin/ollama ollama serve
30 root 0:00 {ollama} /run/rosetta/rosetta /usr/bin/ollama ollama run mistral
But when I go to access the model:
$curl http://0.0.0.0:11434
- Ollama is runninglitellm.exceptions.APIConnectionError: litellm.APIConnectionError: OllamaException - {"error":"model \"mistral\" not found, try pulling it first"}
If I manually exec into the container and run
ollama run mistral
it tries to pull the model all over again despite the model already having been pulled during build.Any help would be wonderful, let me know if there are relevant details missing.
OS
Linux
GPU
Other
CPU
Intel
Ollama version
ollama version is 0.0.0 is the output