Llama Deploy (formerly llama-agents
) is an async-first framework for deploying, scaling, and productionizing agentic
multi-service systems based on workflows from llama_index
.
With Llama Deploy, you can build any number of workflows in llama_index
and then run them as services, accessible
through a HTTP API by a user interface or other services part of your system.
The goal of Llama Deploy is to easily transition something that you built in a notebook to something running on the cloud with the minimum amount of changes to the original code, possibly zero. In order to make this transition a pleasant one, you can interact with Llama Deploy in two ways:
llamactl
CLI from a shell.Both the SDK and the CLI are part of the Llama Deploy Python package. To install, just run:
pip install llama_deploy
[!TIP] For a comprehensive guide to Llama Deploy's architecture and detailed descriptions of its components, visit our official documentation.
llama_index
workflows with minimal changes to your code.[!NOTE] This project was initially released under the name
llama-agents
, but the introduction of Workflows inllama_index
turned out to be the most intuitive way for our users to develop agentic applications. We then decided to add new agentic features inllama_index
directly, and focus Llama Deploy on closing the gap between local development and remote execution of agents as services.
The fastest way to start using Llama Deploy is playing with a practical example. This repository contains a few applications you can use as a reference:
We recommend to start from the Quick start example and move to Use a deployment from a web-based user interface immediately after. Each folder contains a README file that will guide you through the process.