Open ducktapeonmydesk opened 3 days ago
Reason is due to llama_stack.apis.inference.client
calls Llama3.1-8B-Instruct
model by default. You need to run
python -m llama_stack.apis.inference.client localhost 5000 Llama3.2-3B-Instruct
As your server is serving Llama3.2-3B-Instruct
(https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/inference/client.py#L130).
FYI llama_stack.apis.inference.client
is used for dev purpose only, and not officially supported (may be removed in the future). We recommend checking out examples in our llama-stack-apps
repo: https://github.com/meta-llama/llama-stack-apps/tree/main/examples
System Info
WSL2
Information
🐛 Describe the bug
Running Llama3.2-3B-Instruct and get the error
ValueError: Llama3.1-8B-Instruct not registered. Make sure there is an Inference provider serving this model.
when trying to callpython -m llama_stack.apis.inference.client localhost 5000
from client.Error logs
Listening on ['::', '0.0.0.0']:5000 INFO: Started server process [14346] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit) INFO: 127.0.0.1:60312 - "POST /inference/chat_completion HTTP/1.1" 200 OK Traceback (most recent call last): File "/home/NAME/llama-stack/llama_stack/distribution/server/server.py", line 209, in sse_generator async for item in await event_gen: File "/home/NAME/llama-stack/llama_stack/distribution/routers/routers.py", line 99, in chat_completion provider = self.routing_table.get_provider_impl(model) File "/home/NAME/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 131, in get_provider_impl raise ValueError( ValueError:
Llama3.1-8B-Instructnot registered. Make sure there is an Inference provider serving this model.
Expected behavior
A 2 lined poem.