meta-llama / llama-stack

Model components of the Llama Stack APIs
MIT License
4.02k stars 533 forks source link

ValueError: `Llama3.1-8B-Instruct` not registered. Make sure there is an Inference provider serving this model. #345

Open ducktapeonmydesk opened 3 days ago

ducktapeonmydesk commented 3 days ago

System Info

WSL2

Information

🐛 Describe the bug

Running Llama3.2-3B-Instruct and get the error ValueError: Llama3.1-8B-Instruct not registered. Make sure there is an Inference provider serving this model. when trying to call python -m llama_stack.apis.inference.client localhost 5000 from client.

Error logs

Listening on ['::', '0.0.0.0']:5000 INFO: Started server process [14346] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit) INFO: 127.0.0.1:60312 - "POST /inference/chat_completion HTTP/1.1" 200 OK Traceback (most recent call last): File "/home/NAME/llama-stack/llama_stack/distribution/server/server.py", line 209, in sse_generator async for item in await event_gen: File "/home/NAME/llama-stack/llama_stack/distribution/routers/routers.py", line 99, in chat_completion provider = self.routing_table.get_provider_impl(model) File "/home/NAME/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 131, in get_provider_impl raise ValueError( ValueError:Llama3.1-8B-Instructnot registered. Make sure there is an Inference provider serving this model.

Expected behavior

A 2 lined poem.

yanxi0830 commented 3 days ago

Reason is due to llama_stack.apis.inference.client calls Llama3.1-8B-Instruct model by default. You need to run

python -m llama_stack.apis.inference.client localhost 5000 Llama3.2-3B-Instruct

As your server is serving Llama3.2-3B-Instruct (https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/inference/client.py#L130).

FYI llama_stack.apis.inference.client is used for dev purpose only, and not officially supported (may be removed in the future). We recommend checking out examples in our llama-stack-apps repo: https://github.com/meta-llama/llama-stack-apps/tree/main/examples