How to specify the model type using the pre-build docker?

Travis-Barton commented 4 days ago

System Info

Using Windows 11

Information

[X] The official example scripts
[X] My own modified scripts

🐛 Describe the bug

When running:

 docker run -it -p 5000:5000 -v  C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu --disable-ipv6

I'm able to get the container going for a 3.1 model

astack/llamastack-local-gpu --yaml_config C:\Users\sivar\PycharmProjects\llama_stack_learner\run.yaml 
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
    fire.Fire(main)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace       
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
    with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:UserssivarPycharmProjectsllama_stack_learnerrun.yaml'

sivar@Odysseus MINGW64 ~/PycharmProjects/llama_stack_learner
$ docker run -it -p 5000:5000 -v  C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu --disable-ipv6
Resolved 12 providers
 inner-inference => meta-reference
 models => __routing_table__
 inference => __autorouted__
 inner-safety => meta-reference
 inner-memory => meta-reference
 shields => __routing_table__
 safety => __autorouted__
 memory_banks => __routing_table__
 memory => __autorouted__
 agents => meta-reference
 telemetry => meta-reference
 inspect => __builtin__

Loading model `Llama3.1-8B-Instruct`
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
/usr/local/lib/python3.10/site-packages/torch/__init__.py:955: UserWarning: torch.set_default_tensor_t
ype() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:432.)
  _C._set_default_tensor_type(t)
...

but I want it to load 3.2 Vision (attempts to call that model fail) eg:

$ curl -X POST http://localhost:5000/inference/chat_completion -H "Content-Type: application/json" -d 
'{"model": " Llama3.2-11B-Vision-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a 2 sentence poem about the moon."}], "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}}'
{"detail":"Invalid value: ` Llama3.2-11B-Vision-Instruct` not registered"}

I've tried pointing the docker container towards my local YAML file with the right model:

version: '2'
built_at: '2024-10-08T17:40:45.325529'
image_name: local
docker_image: null
conda_env: local
apis:
  - shields
  - agents
  - models
  - memory
  - memory_banks
  - inference
  - safety
providers:
  inference:
    - provider_id: meta0
      provider_type: meta-reference
      config:
        model: Llama3.2-11B-Vision-Instruct  # Updated model name
        quantization: null
        torch_seed: null
        max_seq_len: 4096
        max_batch_size: 1
  safety:
    - provider_id: meta0
      provider_type: meta-reference
      config:
        llama_guard_shield:
          model: Llama-Guard-3-1B
          excluded_categories: []
          disable_input_check: false
          disable_output_check: false
        prompt_guard_shield:
          model: Prompt-Guard-86M
  memory:
    - provider_id: meta0
      provider_type: meta-reference
      config: {}
  agents:
    - provider_id: meta0
      provider_type: meta-reference
      config:
        persistence_store:
          namespace: null
          type: sqlite
          db_path: ~/.llama/runtime/kvstore.db
  telemetry:
    - provider_id: meta0
      provider_type: meta-reference
      config: {}

but when i try with this:

$ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml

I get this:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
    fire.Fire(main)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace       
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
    with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml'

is there a better way to specify the right model_id?

(ps i do have the model downloaded)

Error logs

see abpve

Expected behavior

see above

yanxi0830 commented 3 days ago

Change your docker run command here with --yaml_config /root/my-run.yaml, as it will be reading the file inside docker container. E.g.

docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml

See guide here: https://github.com/meta-llama/llama-stack/tree/main/distributions/meta-reference-gpu

Travis-Barton commented 2 days ago

I get this error:


sivar@Odysseus MINGW64 ~/PycharmProjects/llama_stack_learner
$ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
    fire.Fire(main)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
    with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Program Files/Git/root/my-run.yaml'

I installed llama-stack with pip, so maybe its missing some local file? I just have my .yaml file sitting in my dummy repo.

meta-llama / llama-stack