run local ollama failed.

alexhegit commented 1 month ago

Run client failed by

$ python -m llama_stack.apis.inference.client localhost 11434
User>hello world, write me a 2 sentence poem about the moon
Error: HTTP 404 404 page not found

run stack with log

(LlamaStack) amd@tw024:~/alehe$ llama stack run local-ollama --port 11434 2>&1 | tee llamastack-local-ollama.log
router_api Api.inference
router_api Api.safety
router_api Api.memory
Resolved 8 providers in topological order
  Api.models: routing_table
  Api.inference: router
  Api.shields: routing_table
  Api.safety: router
  Api.memory_banks: routing_table
  Api.memory: router
  Api.agents: meta-reference
  Api.telemetry: meta-reference

Initializing Ollama, checking connectivity to server...
Serving GET /healthcheck
Serving GET /models/get
Serving GET /models/list
Serving POST /safety/run_shield
Serving POST /memory/create
Serving DELETE /memory/documents/delete
Serving DELETE /memory/drop
Serving GET /memory/documents/get
Serving GET /memory/get
Serving POST /memory/insert
Serving GET /memory/list
Serving POST /memory/query
Serving POST /memory/update
Serving GET /shields/get
Serving GET /shields/list
Serving GET /models/get
Serving GET /models/list
Serving GET /shields/get
Serving GET /shields/list
Serving GET /memory_banks/get
Serving GET /memory_banks/list
Serving POST /agents/create
Serving POST /agents/session/create
Serving POST /agents/turn/create
Serving POST /agents/delete
Serving POST /agents/session/delete
Serving POST /agents/session/get
Serving POST /agents/step/get
Serving POST /agents/turn/get
Serving GET /memory_banks/get
Serving GET /memory_banks/list
Serving POST /inference/chat_completion
Serving POST /inference/completion
Serving POST /inference/embeddings
Listening on :::11434
INFO:     Started server process [2741910]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://[::]:11434 (Press CTRL+C to quit)
INFO:     ::1:37116 - "POST /inference/chat_completion HTTP/1.1" 200 OK
Traceback (most recent call last):
  File "/home/amd/.local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 231, in sse_generator
    async for item in event_gen:
  File "/home/amd/.local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 117, in chat_completion
    async for chunk in self.routing_table.get_provider_impl(model).chat_completion(
  File "/home/amd/.local/lib/python3.10/site-packages/llama_stack/distribution/routers/routing_tables.py", line 38, in get_provider_impl
    raise ValueError(f"Could not find provider for {routing_key}")
ValueError: Could not find provider for Llama3.1-8B-Instruct

cat ~/.llama/builds/conda/local-ollama-run.yaml

built_at: '2024-09-30T06:11:19.942484'
image_name: local-ollama
docker_image: null
conda_env: local-ollama
apis_to_serve:
- memory
- memory_banks
- agents
- shields
- models
- safety
- inference
api_providers:
  inference:
    providers:
    - remote::ollama
  memory:
    providers:
    - meta-reference
  safety:
    providers:
    - meta-reference
  agents:
    provider_id: meta-reference
    config:
      persistence_store:
        namespace: null
        type: sqlite
        db_path: /home/amd/.llama/runtime/kvstore.db
  telemetry:
    provider_id: meta-reference
    config: {}
routing_table:
  inference:
  - provider_id: remote::ollama
    config:
      host: localhost
      port: 11434
    routing_key: Meta-Llama3.1-8B-Instruct
  memory:
  - provider_id: meta-reference
    config: {}
    routing_key: vector
  safety:
  - provider_id: meta-reference
    config:
      llama_guard_shield: null
      prompt_guard_shield: null
    routing_key: llama_guard
  - provider_id: meta-reference
    config:
      llama_guard_shield: null
      prompt_guard_shield: null
    routing_key: code_scanner_guard
  - provider_id: meta-reference
    config:
      llama_guard_shield: null
      prompt_guard_shield: null
    routing_key: injection_shield
  - provider_id: meta-reference
    config:
      llama_guard_shield: null
      prompt_guard_shield: null
    routing_key: jailbreak_shield

cheesecake100201 commented 1 month ago

The routing key in your yaml file is Meta-Llama3.1-8B-Instruct but in your stack trace it expects Llama3.1-8B-Instruct. Configure your yaml file again and this time enter the model name as Llama3.1-8B-Instruct. And this issue should be sorted.

ashwinb commented 1 month ago

The answer by @cheesecake100201 above should resolve this. Please re-open if it does not.

meta-llama / llama-stack

run local ollama failed. #154