run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.68k stars 5.25k forks source link

[Question]: QueryEngineTool #16773

Open lmaddox opened 1 week ago

lmaddox commented 1 week ago

Question Validation

Question

I've got an Ollama ReAct agent, Redis Ingestion Pipeline and a destructive DatabaseReader. The agent kept getting empty responses from the QueryEngineTool (sisyphus the syslog-indexing daemon).

From what I understand, the correct format should be:

Action: sisyphus
Action Input: {"input": "describe the data or whatever"}

But that wasn't working. So I told the LLM I think there's a problem with the parser, and to just try things till it works.

User: let's try using the tool again.
> Running step 2458cb50-805c-44b6-a797-fbe826979101. Step input: let's try using the tool again.
Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: sisyphus
Action Input: {'input': AttributedDict([('title', 'Input'), ('type', 'string')])}
Observation: Error: 2 validation errors for QueryStartEvent
query.str
  Input should be a valid string [type=string_type, input_value=AttributedDict([('title',...'), ('type', 'string')]), input_type=AttributedDict]
    For further information visit https://errors.pydantic.dev/2.9/v/string_type
query.QueryBundle.query_str
  Field required [type=missing, input_value=AttributedDict([('title',...'), ('type', 'string')]), input_type=AttributedDict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
> Running step 4770a398-4e57-49e2-8a52-5165bc2e4908. Step input: None
Thought: The user is getting an error message indicating that the input type and required fields are missing from the query.
Action: sisyphus
Action Input: {'input': AttributedDict([('title', 'Input'), ('type', 'string')])}
Observation: Error: 2 validation errors for QueryStartEvent
query.str
  Input should be a valid string [type=string_type, input_value=AttributedDict([('title',...'), ('type', 'string')]), input_type=AttributedDict]
    For further information visit https://errors.pydantic.dev/2.9/v/string_type
query.QueryBundle.query_str
  Field required [type=missing, input_value=AttributedDict([('title',...'), ('type', 'string')]), input_type=AttributedDict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
> Running step 88c6115b-7fd6-4a4f-9c63-d64f33410025. Step input: None
Thought: The error message still indicates that the input is missing and should be a string, but it seems like the tool needs more specific information about what to query for.
Action: sisyphus
Action Input: {'input': 'syslog query', 'query': 'select * from syslog where ...'}
Observation: The provided system event seems to be related to a scanning operation that has completed successfully. The mention of "Syslog (redis_vector_store) in background" suggests that the event is likely from a logging or monitoring tool.
The provided system event seems to be related to a scanning operation that has completed successfully. The mention of "Syslog (redis_vector_store) in background" suggests that the event is likely from a logging or monitoring tool.

Now, it finally returns something. But... why? Please help me understand this so I can explain it to the robot.

dosubot[bot] commented 1 week ago

The issue you're encountering seems to be related to the input validation logic of the QueryEngineTool. The error messages indicate that the input provided was not in the expected format, specifically that the query_str field was missing or not a valid string.

Here's a breakdown of what likely happened:

  1. Initial Input Format: The initial input you provided was in the form of an AttributedDict, which is not a valid string. The QueryEngineTool expects the query_str to be a simple string, as indicated by the validation errors you received.

  2. Correct Input Format: The correct format for the Action Input should be a dictionary with a key "input" whose value is a string. For example:

    {"input": "describe the data or whatever"}
  3. Resolution: When you changed the input to include a valid string for the query_str, such as:

    {'input': 'syslog query', 'query': 'select * from syslog where ...'}

    This input was correctly formatted, allowing the QueryEngineTool to process it without validation errors.

The key takeaway is that the query_str field must be a valid string, and any additional fields should be structured in a way that the tool expects. The Pydantic validation framework used by the tool enforces these constraints, and any deviation from the expected format will result in validation errors [1].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 week ago

I have no idea what's going on there. Some code to reproduce might be helpful.

Imo most open source llms really suck at react. I think llama3.1 will probably be the best (but at that point, llama3.1 supports function calling, i would use a FunctionCallingAgent)

lmaddox commented 1 week ago

I'm using a llama3.2. I'll look into the Function Calling Agent. Thanks

update: I... think Imma have to put together a minimal reproducible example. This is weird.

Thought: The current language of the user is: English. I need to use a tool to help me understand the problem.
Action: sisyphus
Action Input: {'properties': AttributedDict([('input', AttributedDict([('title', 'Input'), ('type', 'string')]))]), 'required': ['input'], 'type': 'object'}
Observation: The request object contains a single required property named "input". The "input" property has two attributes, one of which is the title and another of which is the type.
The request object contains a single required property named "input". The "input" property has two attributes, one of which is the title and another of which is the type.
User: Can you retrieve anything re: lat & longitude from the tool?
> Running step d1aa597c-a664-41ee-9a6e-d3b0407d36bd. Step input: Can you retrieve anything re: lat & longitude from the tool?
Thought: The user wants to use a tool to retrieve information related to latitude and longitude. I should try using the sisyphus tool with the correct input format.
Action: sisyphus
Action Input: {'input': '{"query": "lat and long"}', 'properties': AttributedDict([('properties', AttributedDict([('input', AttributedDict([('title', 'Input'), ('type', 'string')]))]))])}
Observation: (my actual longitude)
(a floating point number)
User: (blah blah blah) where are you?
> Running step 2d05bfa5-f07d-4e58-81a4-7f9264a19bed. Step input: (conversational input)
Thought: I need to use sisyphus to get information about my location in the cluster.
Action: sisyphus
Action Input: {'properties': AttributedDict([('input', AttributedDict([('title', 'Input'), ('type', 'string')]))])}
Observation: The input has a title of 'Input' and a type of 'string'.
The input has a title of 'Input' and a type of 'string'.
logan-markewich commented 1 week ago

Seems to me llama3.2 is hallucinating pretty bad on the react loop -- not totally unexpected

lmaddox commented 1 week ago

FunctionCallingAgent example posted on #16781. I know it can be done with lower level efforts, so I might need to get into the parsing code. Or perhaps I need to go lower level than the FunctionCallingAgent?

logan-markewich commented 1 week ago

There is definitely lower-level https://docs.llamaindex.ai/en/latest/examples/llm/openai/#manual-tool-calling (this applies to Ollama as well)