Closed dineshyv closed 1 week ago
needs to go in after https://github.com/meta-llama/llama-stack/pull/477
test plan: 1) pip install -e . on llama stack main 2) pip install -e . in llama stack client python 3) run a app with direct client:
import asyncio from llama_stack_client.lib.direct.direct import LlamaStackDirectClient async def main(): client = await LlamaStackDirectClient.from_template("ollama") response = await client.models.list() print(response) if __name__ == "__main__": asyncio.run(main())
INFERENCE_MODEL=llama3.2:1b-instruct-fp16 python app.py Resolved 12 providers inner-inference => ollama inner-memory => faiss models => __routing_table__ inference => __autorouted__ inner-safety => llama-guard shields => __routing_table__ safety => __autorouted__ memory_banks => __routing_table__ memory => __autorouted__ agents => meta-reference telemetry => meta-reference inspect => __builtin__ checking connectivity to Ollama at `http://localhost:11434`... `llama3.2:1b-instruct-fp16` already registered with `ollama` Models: llama3.2:1b-instruct-fp16 served by ollama [Model(identifier='llama3.2:1b-instruct-fp16', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]
needs to go in after https://github.com/meta-llama/llama-stack/pull/477
test plan: 1) pip install -e . on llama stack main 2) pip install -e . in llama stack client python 3) run a app with direct client: