SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Ollama CLI does not work when OLLAMA_HOST is set to the SkyServe endpoint.
Note that the openAI API endpoint (curl -L $ENDPOINT/v1/chat/completions) works fine, this issue pertains to only the ollama CLI.
User report from slack:
while the Service replica endpoint works flawless, the Cluster-controller endpoint only works with curl, not with ollama list
Env: ollama on sky-serve on k8s with loadbalancer.
sky status
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
sky-serve-controller-d84a77ca 37 mins ago 1x Kubernetes(4CPU--4GB, cpus=4+, disk_size=200, ports=['30001-30020']... UP - sky serve up serve/ollama...
Managed jobs
No in-progress managed jobs. (See: sky jobs -h)
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
ollama 1 25m 27s READY 1/1 10.11.18.101:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
ollama 1 1 http://10.11.18.102:11434/ 28 mins ago 1x Kubernetes({'RTX3060': 1}) READY kubernetes
now, using OLLAMA_HOST with the two different endpoints does not fully work.
export OLLAMA_HOST=$(sky serve status --endpoint ollama)
or
export OLLAMA_HOST=10.11.18.101:30001
ollama list
Error: could not connect to ollama app, is it running?
and
export OLLAMA_HOST=10.11.18.102:11434
ollama list
<shows all models>
Ollama CLI does not work when
OLLAMA_HOST
is set to the SkyServe endpoint.Note that the openAI API endpoint (
curl -L $ENDPOINT/v1/chat/completions
) works fine, this issue pertains to only the ollama CLI.User report from slack: