run-llama / llama_deploy

Deploy your agentic worfklows to production
https://docs.llamaindex.ai/en/stable/module_guides/llama_deploy/
MIT License
1.83k stars 186 forks source link

Setting up Llamadeploy for multiagent deployment on k8s #357

Open hz6yc3 opened 3 days ago

hz6yc3 commented 3 days ago

There is no documentation that provides guidance on how to set up Llamadeploy (control plane, message queue and service deployment) on Kubernetes. The example provided in the code is little confusing and our company badly need some guidance on setting up Llamadeploy for enterprise deployment. Any relevant documentation or sample configuration that someone can share would be really helpful.

hz6yc3 commented 2 days ago

@masci thanks a lot for looking into my question above. We are kind of blocked and there is a some urgency in completing the PoC for agentic workflows using LlamaIndex and greatly appreciate if you can provide some guidance with the request above.

logan-markewich commented 2 days ago

@hz6yc3 while it might not be totally clear from docs/examples, its fairly straightforward. You'd need to use the lower-level API https://docs.llamaindex.ai/en/stable/module_guides/llama_deploy/30_manual_orchestration/

Basically, you can setup a docker image that deploys the core

Then another docker image from there that deploys a workflow service (or several, depending on how you want to manage scaling)

Once you have it running in docker, its fairly transferrable to then launching those docker images in a k8s cluster

This example walks through all of this, including k8s https://github.com/run-llama/llama_deploy/tree/main/examples/message-queue-integrations https://github.com/run-llama/llama_deploy/tree/main/examples/message-queue-integrations/rabbitmq/kubernetes

We are working on updates to make this easier though, using a more simple top-level yaml file rather than writing code for all the deployments. But in-lieu of that, the above is the best approach.

hz6yc3 commented 2 days ago

@logan-markewich thanks a lot! Let me read through the documents. We were not sure on the guidance for centrally deploying the core components because based on the architecture in the documentation it seemed like we have to deploy the core components (control plane, message queue) for each deployment separately. The way we deploy applications in our company is that every application is deployed within its own namespace on the cluster so we weren't sure how we would want to set up the deployment pattern using llama deploy.