meta-llama / llama-stack

Composable building blocks to build Llama Apps
MIT License
4.67k stars 599 forks source link

Error in meta-reference-gpu docker #418

Open subramen opened 2 weeks ago

subramen commented 2 weeks ago

System Info

amd64, 1gpu

Information

šŸ› Describe the bug

ValueError: Provider inline::llama-guard is not available for API Api.safety

Commented out run.yaml and build.yaml in an attempt to fix the above.

build.yaml

name: meta-reference-gpu
distribution_spec:
  docker_image: pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime
  description: Use code from `llama_stack` itself to serve all llama stack APIs
  providers:
    inference: meta-reference
    memory:
    - meta-reference
    - remote::chromadb
    - remote::pgvector
    # safety: inline::llama-guard
    agents: meta-reference
    telemetry: meta-reference

run.yaml

version: '2'
built_at: '2024-10-08T17:40:45.325529'
image_name: local
docker_image: null
conda_env: local
apis:
- shields
- agents
- models
- memory
- memory_banks
- inference
# - safety
providers:
  inference:
  - provider_id: inference0
    provider_type: meta-reference
    config:
      model: Llama3.2-3B-Instruct
      quantization: null
      torch_seed: null
      max_seq_len: 4096
      max_batch_size: 1
  - provider_id: inference1
    provider_type: meta-reference
    config:
      model: Llama-Guard-3-1B
      quantization: null
      torch_seed: null
      max_seq_len: 2048
      max_batch_size: 1
  # safety:
  # - provider_id: meta0
  #   provider_type: inline::llama-guard
  #   config:
  #     model: Llama-Guard-3-1B
  #     excluded_categories: []
  # - provider_id: meta1
  #   provider_type: inline::prompt-guard
  #   config:
  #     model: Prompt-Guard-86M
# Uncomment to use prompt guard
#      prompt_guard_shield:
#        model: Prompt-Guard-86M
  memory:
  - provider_id: meta0
    provider_type: meta-reference
    config: {}
  # Uncomment to use pgvector
  # - provider_id: pgvector
  #   provider_type: remote::pgvector
  #   config:
  #     host: 127.0.0.1
  #     port: 5432
  #     db: postgres
  #     user: postgres
  #     password: mysecretpassword
  agents:
  - provider_id: meta0
    provider_type: meta-reference
    config:
      persistence_store:
        namespace: null
        type: sqlite
        db_path: ~/.llama/runtime/agents_store.db
  telemetry:
  - provider_id: meta0
    provider_type: meta-reference
    config: {}

See error logs for subsequent error.

Error logs

meta-reference-gpu-llamastack-1  | /opt/conda/lib/python3.11/site-packages/pydantic/_internal/_fields.py:172: UserWarning: Field name "schema" in "JsonResponseFormat" shadows an attribute in parent "BaseModel"
meta-reference-gpu-llamastack-1  |   warnings.warn(
meta-reference-gpu-llamastack-1  | Traceback (most recent call last):
meta-reference-gpu-llamastack-1  |   File "<frozen runpy>", line 198, in _run_module_as_main
meta-reference-gpu-llamastack-1  |   File "<frozen runpy>", line 88, in _run_code
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/site-packages/llama_stack/distribution/server/server.py", line 347, in <module>
meta-reference-gpu-llamastack-1  |     fire.Fire(main)
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
meta-reference-gpu-llamastack-1  |     component_trace = _Fire(component, args, parsed_flag_args, context, name)
meta-reference-gpu-llamastack-1  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
meta-reference-gpu-llamastack-1  |     component, remaining_args = _CallAndUpdateTrace(
meta-reference-gpu-llamastack-1  |                                 ^^^^^^^^^^^^^^^^^^^^
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
meta-reference-gpu-llamastack-1  |     component = fn(*varargs, **kwargs)
meta-reference-gpu-llamastack-1  |                 ^^^^^^^^^^^^^^^^^^^^^^
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/site-packages/llama_stack/distribution/server/server.py", line 279, in main
meta-reference-gpu-llamastack-1  |     impls = asyncio.run(resolve_impls(config))
meta-reference-gpu-llamastack-1  |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run
meta-reference-gpu-llamastack-1  |     return runner.run(main)
meta-reference-gpu-llamastack-1  |            ^^^^^^^^^^^^^^^^
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
meta-reference-gpu-llamastack-1  |     return self._loop.run_until_complete(task)
meta-reference-gpu-llamastack-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
meta-reference-gpu-llamastack-1  |     return future.result()
meta-reference-gpu-llamastack-1  |            ^^^^^^^^^^^^^^^
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/site-packages/llama_stack/distribution/resolver.py", line 150, in resolve_impls
meta-reference-gpu-llamastack-1  |     sorted_providers = topological_sort(
meta-reference-gpu-llamastack-1  |                        ^^^^^^^^^^^^^^^^^
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/site-packages/llama_stack/distribution/resolver.py", line 229, in topological_sort
meta-reference-gpu-llamastack-1  |     dfs((api_str, providers), visited, stack)
meta-reference-gpu-llamastack-1  |   File "/opt/conda/lib/python3.11/site-packages/llama_stack/distribution/resolver.py", line 220, in dfs
meta-reference-gpu-llamastack-1  |     dfs((dep, providers_with_specs[dep]), visited, stack)
meta-reference-gpu-llamastack-1  |               ~~~~~~~~~~~~~~~~~~~~^^^^^
meta-reference-gpu-llamastack-1  | KeyError: 'safety'

Expected behavior

it works

yanxi0830 commented 2 weeks ago

Are you building docker image locally, or pulling from docker hub?

subramen commented 2 weeks ago

i'm using $ cd distributions/meta-reference-gpu && docker compose up as per https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/self_hosted_distro/meta-reference-gpu.html

looks like it's pulling from the hub?

services:
  llamastack:
    image: llamastack/distribution-meta-reference-gpu
yanxi0830 commented 2 weeks ago

@subramen Option 1 is pulling from hub. Option 2 allows you to build locally on latest source code.

subramen commented 2 weeks ago

Follow-up to get it working:

1. Installed llama-stack from source on the cherrypick-working branch. this also changes build.yaml to safety: meta-reference instead of inline::llamaguard

  1. in run.yaml, safety: []

3.cd distributions/meta-reference-gpu && docker compose up

ashwinb commented 2 weeks ago

@subramen: oh boy you have identified another source of backward incompatibility. We updated the names of these providers -- I am going to revert that back and add a "deprecation warning" which shows up accordingly.

@raghotham we are going to be slowing down considerably now unless we rapidly stabilize and update our images between releases.

raghotham commented 2 weeks ago

Goal for deprecation messages was to move forward without breaking backward compatibility. Letā€™s talk more how we can move fast, responsibly! :)

ashwinb commented 2 weeks ago

@subramen: we added support for "deprecating" things so we can change format sensibly by providing warnings and errors which are more graceful. But our current docker images don't have any of the deprecation code either. So this week we will accept bad breakage and rush through a bunch of breaking changes and try to get to a reasonable stable, well-tested state by Friday.