run-llama / llama_deploy

Deploy your agentic worfklows to production
https://docs.llamaindex.ai/en/stable/module_guides/llama_deploy/
MIT License
1.82k stars 186 forks source link

QueryEngineTool returns "empty response" once workflow is deployed on llama-deploy #250

Closed tituslhy closed 1 month ago

tituslhy commented 1 month ago

I've tested my code on the following levels:

  1. query_engine.query()
  2. I then wrapped this query engine into an agent. FunctionCallingAgent.chat() worked
  3. I then wrapped the FunctionCallingAgent into a Workflow step. Workflow.run(query="...") worked.
  4. I then deployed the workflow on llama_deploy. This was when I started getting empty responses from my query engine for RAG related tasks.

deploy_core.py

from llama_deploy import (
    deploy_core,
    ControlPlaneConfig,
)
from llama_deploy.message_queues.simple import (
    SimpleMessageQueueConfig
)

control_plane_config = ControlPlaneConfig(
    host = "0.0.0.0",
    port = 8288,
)
message_queue_config = SimpleMessageQueueConfig(
    host = "0.0.0.0",
    port = 8277,
)

if __name__ == "__main__":
    import asyncio

    asyncio.run(
        deploy_core(
            control_plane_config = control_plane_config,
            message_queue_config=message_queue_config,
            disable_control_plane=False,
            disable_message_queue=False,
        )
    )

deploy_workflow.py

from llama_deploy import (
    deploy_workflow,
    WorkflowServiceConfig,
    ControlPlaneConfig,
)

import os
from dotenv import load_dotenv, find_dotenv
import sys

_ = load_dotenv(find_dotenv())
__curdir__ = os.getcwd()

if "workflow" in __curdir__:
    sys.path.append(
        os.path.join(
            __curdir__,
            "../../src"
        )
    )
else: #in the main root folder
    sys.path.append("./src")

from multi_agent_system import MultiAgentSystem

control_plane_config = ControlPlaneConfig(
    host = "0.0.0.0",
    port = 8288,
)
workflow_config = WorkflowServiceConfig(
    host = "0.0.0.0",
    port = 8299,
    service_name="Assistant"
)

async def main():
    await deploy_workflow(
        workflow = MultiAgentSystem(timeout=600.0, verbose = True),
        workflow_config = workflow_config,
        control_plane_config = control_plane_config
    )

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Testing code on a Jupyter Notebook:

from llama_deploy import LlamaDeployClient, ControlPlaneConfig

control_plane_config = ControlPlaneConfig(
    host = "0.0.0.0",
    port = 8288,
)
client = LlamaDeployClient(control_plane_config)
session = client.create_session()
query = "What is the definition of a black swan event?"
result = session.run(
    "Assistant", 
    query=query,
    history = [
        {
            "role": "user",
            "content": query 
        }
    ]
)

The output I got in my terminal was:

=== Calling Function ===
Calling function: Definitions_Tool with args: {"input": "black swan event"}
=== Function Output ===
Empty Response

Not too sure why this is happening. I'm using ChromaDB as my vector database to do this.

logan-markewich commented 1 month ago

@tituslhy Might need more details to replicate this? What is MultiAgentSystem ? Is there some minimum workflow that replicates this?

logan-markewich commented 1 month ago

Empty response from a query engine really only happens when zero results are retrieved. Perhaps it's not loading your existing chroma collection properly?

tituslhy commented 1 month ago

Oh sorry. MultiAgentSystem is just the name of the workflow. Here's a minimally reproducible example:

llm = OpenAI()
embed_model = OpenAIEmbedding()

db_path = "./path/to/db"
chroma_client = chromadb.PersistentClient(path=db_path)
chroma_collection = chroma_client.get_or_create_collection(name="DefinitionsTool")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)
query_engine = index.as_query_engine(
        similarity_top_k = 4,
        llm = llm
    )
tools = [
        QueryEngineTool(
             query_engine = query_engine,
             metadata = ToolMetadata(
                 name = "Definitions_Tool",
                 description = """Use this tool to answer questions related to definitions such as
                 what is "yield"."""
                 )
             )
]
agent_worker = FunctionCallingAgentWorker.from_tools(tools=tools)
agent = agent_worker.as_agent()

class MultiAgentSystem(Workflow):

        def __init__(self):
              self.rag_agent = agent

        @step
         async def process_query(self, ev: StartEvent) -> RAGEvent:
              .... #some code here to route to RAGEvent

         @step
          async def get_definitions(self, ev: RAGEvent) -> StopEvent
               response = await self.rag_agent.achat(ev.query)
               return StopEvent(str(response))

This all works. When I use the query engine to query, or chat with the agent or run the task via workflow - I get the correct results. It's only once I deployed the workflow to llama-deploy that my RAG tool starts returning me empty responses for the exact same question - actually for any question.

tituslhy commented 1 month ago

I think this might be a ChromaDB problem. I changed the vector database to Qdrant and it immediately worked ok. Closing the issue.