run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.42k stars 4.69k forks source link

Role 'tool' must be a response to a preceding message with 'tool_calls' #13715

Open anantgupta129 opened 1 month ago

anantgupta129 commented 1 month ago

Bug Description

getting openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid parameter: messages with role 'tool' must be a response to a preceeding message with 'tool_calls' error when using json chat store with persistent paths, but when i checked the stored json tool_calls is saved before role tool and the chats are also saved by chat_store.

I am receiving this error in long chats but only when i am loading the chat store again for specific key, when i tested separately in while loop it works fine without error

Version

0.10.38

Steps to Reproduce

here is the api code


def stream_generator(generator, chat_store: SimpleChatStore):
    yield from (json.dumps({"type": "content_block", "text": text}) for text in generator)
    chat_store.persist(persist_path=CHAT_PERSIST_PATH)

@app.post("/chat")
async def chat(body: ChatRequest = Body()):
    try:
        if Path(CHAT_PERSIST_PATH).exists():
            chat_store = SimpleChatStore.from_persist_path(CHAT_PERSIST_PATH)
        else:
            chat_store = SimpleChatStore()

        memory = ChatMemoryBuffer.from_defaults(
            chat_store=chat_store,
            chat_store_key=body.chatId,
        )
        tool_spec = DataBaseToolSpec().to_tool_list()
        agent = OpenAIAgent.from_tools(
            tool_spec, llm=llm, verbose=True, system_prompt=system_prompt, memory=memory
        )
        response = agent.stream_chat(body.query)
        return StreamingResponse(
            stream_generator(response.response_gen, chat_store), media_type="application/x-ndjson"
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e)) from e

i need to load the chat store on every request as want to save different chats and multiple user

Relevant Logs/Tracbacks

File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\llama_index\core\chat_engine\types.py", line 258, in response_gen
    |     raise self.exception
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\threading.py", line 1073, in _bootstrap_inner        
    |     self.run()
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\threading.py", line 1010, in run
    |     self._target(*self._args, **self._kwargs)
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 274, in wrapper
    |     result = func(*args, **kwargs)
    |              ^^^^^^^^^^^^^^^^^^^^^
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\llama_index\core\chat_engine\types.py", line 163, in write_response_to_history
    |     for chat in self.chat_stream:
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\llama_index\core\llms\callbacks.py", line 154, in wrapped_gen
    |     for x in f_return_val:
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\llama_index\llms\openai\base.py", line 454, in gen
    |     for response in client.chat.completions.create(
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\openai\_utils\_utils.py", line 277, in wrapper
    |     return func(*args, **kwargs)
    |            ^^^^^^^^^^^^^^^^^^^^^
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\openai\resources\chat\completions.py", line 590, in create
    |     return self._post(
    |            ^^^^^^^^^^^
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\openai\_base_client.py", line 1240, in post
    |     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
    |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\openai\_base_client.py", line 921, in request
    |     return self._request(
    |            ^^^^^^^^^^^^^^
    |   File "C:\Users\anant\miniconda3\envs\super\Lib\site-packages\openai\_base_client.py", line 1020, in _request
    |     raise self._make_status_error_from_response(err.response) from None
    | openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid parameter: messages with role 'tool' must be a response to a preceeding message with 'tool_calls'.", 'type': 'invalid_request_error', 'param': 'messages.[1].role', 'code': None}}
anantgupta129 commented 1 month ago

this is the code i ran separately, on loading the chat store once this error is not coming

    chat_store = SimpleChatStore.from_persist_path(CHAT_PERSIST_PATH)
    memory = ChatMemoryBuffer.from_defaults(
        chat_store=chat_store,
        chat_store_key="temp_chat",
    )
    llm = OpenAI(model="gpt-4o")
    tool_spec = DataBaseToolSpec().to_tool_list()
    agent = OpenAIAgent.from_tools(
        tool_spec, llm=llm, 
        verbose=True, 
        system_prompt=system_prompt, 
        memory=memory
    )

    # press q to quit the chat
    print("[INFO] Press q to quit the chat.")
    while True:
        ask = input("Ask: ")
        if ask.lower() == "q":
            print("[INFO] Quitting...")
            exit()

        response = agent.stream_chat(ask)
        print("\n[x Agent]: ")
        for text in response.response_gen:
            print(text, end="", flush=True)
        print("\n" + "=" * 25 + "\n")
dosubot[bot] commented 1 month ago

Hey there, @anantgupta129! I'm here to help you out with any bugs, questions, or contributions you might have. Let's work together to solve this issue.

The openai.BadRequestError you're encountering is due to the OpenAI API's requirement that messages with the role tool must follow a preceding message with the role tool_calls. This error typically arises when the sequence of messages is not maintained correctly, especially when loading the chat store from a persistent path.

Here are some steps to resolve this issue:

  1. Ensure Correct Sequence of Messages: Verify that the sequence of messages in your chat store is correct. Messages with the role tool should always follow a message with the role tool_calls.

  2. Persist and Load Chat Store Correctly: When persisting and loading the chat store, ensure that the state is consistent and no messages are lost or reordered.

  3. Use File Locking and Atomic Writes: Implement file locking and atomic writes to avoid race conditions and ensure data consistency when multiple requests are accessing the chat store concurrently.

Here is an updated version of your API code with these considerations:

import json
import os
import fsspec
import threading
from fastapi import FastAPI, HTTPException, Body
from fastapi.responses import StreamingResponse
from pathlib import Path
from llama_index.core.llms import ChatMessage
from llama_index.core.storage.chat_store.simple_chat_store import SimpleChatStore
from llama_index.core.memory.chat_memory_buffer import ChatMemoryBuffer
from llama_index.agent.openai_legacy.openai_agent import OpenAIAgent
from llama_index.agent.openai_legacy.tools.database_tool_spec import DataBaseToolSpec

app = FastAPI()
CHAT_PERSIST_PATH = "chat_store.json"
_lock = threading.Lock()

def stream_generator(generator, chat_store: SimpleChatStore):
    yield from (json.dumps({"type": "content_block", "text": text}) for text in generator)
    with _lock:
        chat_store.persist(persist_path=CHAT_PERSIST_PATH)

@app.post("/chat")
async def chat(body: ChatRequest = Body()):
    try:
        with _lock:
            if Path(CHAT_PERSIST_PATH).exists():
                chat_store = SimpleChatStore.from_persist_path(CHAT_PERSIST_PATH)
            else:
                chat_store = SimpleChatStore()

        memory = ChatMemoryBuffer.from_defaults(
            chat_store=chat_store,
            chat_store_key=body.chatId,
        )
        tool_spec = DataBaseToolSpec().to_tool_list()
        agent = OpenAIAgent.from_tools(
            tool_spec, llm=llm, verbose=True, system_prompt=system_prompt, memory=memory
        )
        response = agent.stream_chat(body.query)
        return StreamingResponse(
            stream_generator(response.response_gen, chat_store), media_type="application/x-ndjson"
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e)) from e

Key Changes:

  1. Thread Locking: Added a threading lock (_lock) to ensure that only one thread can access the chat store at a time.
  2. Atomic Writes: Ensured that the chat store is persisted atomically to avoid partial writes.

By ensuring the correct sequence of messages and using file locking and atomic writes, you can avoid the openai.BadRequestError and maintain a consistent state across multiple API requests [1].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

anantgupta129 commented 1 month ago

@dosu still getting same error

dosubot[bot] commented 1 month ago

To resolve the openai.BadRequestError with error code 400, ensure that any message with the role tool is correctly preceded by a message with the role tool_calls. This error typically occurs when the sequence of messages does not follow the expected order.

Here is a more detailed approach to ensure the correct sequence when loading the chat store:

  1. Retrieve the messages for the specific key.
  2. Check the sequence of messages to ensure that any message with the role tool is preceded by a message with the role tool_calls.
  3. If the sequence is incorrect, adjust it accordingly.

Here is an example of how you might implement this check and adjustment:

from typing import List
from llama_index.legacy.llms import ChatMessage
from llama_index.legacy.storage.chat_store.simple_chat_store import SimpleChatStore

def validate_and_correct_messages(messages: List[ChatMessage]) -> List[ChatMessage]:
    corrected_messages = []
    for i, message in enumerate(messages):
        if message.role == 'tool':
            if i == 0 or messages[i-1].role != 'tool_calls':
                raise ValueError("Message with role 'tool' must be preceded by a message with role 'tool_calls'")
        corrected_messages.append(message)
    return corrected_messages

# Example usage with SimpleChatStore
chat_store = SimpleChatStore.from_persist_path("chat_store.json")
key = "your_specific_key"
messages = chat_store.get_messages(key)

try:
    validated_messages = validate_and_correct_messages(messages)
    chat_store.set_messages(key, validated_messages)
except ValueError as e:
    print(f"Error in message sequence: {e}")

Additionally, ensure that when you are using the OpenAI-based function, the tool_calls are correctly handled. The following snippet from the llama_index library shows how tool_calls are parsed and validated:

def _parse_tool_calls(
    tool_calls: List[OpenAIToolCall],
    output_cls: Type[Model],
    allow_multiple: bool = False,
    verbose: bool = False,
) -> Union[Model, List[Model]]:
    outputs = []
    for tool_call in tool_calls:
        function_call = tool_call.function
        assert function_call is not None
        assert function_call.name is not None
        assert function_call.arguments is not None
        if verbose:
            name = function_call.name
            arguments_str = function_call.arguments
            print(f"Function call: {name} with args: {arguments_str}")

        if isinstance(function_call.arguments, dict):
            output = output_cls.parse_obj(function_call.arguments)
        else:
            output = output_cls.parse_raw(function_call.arguments)

        outputs.append(output)

    if allow_multiple:
        return outputs
    else:
        if len(outputs) > 1:
            _logger.warning(
                "Multiple outputs found, returning first one. "
                "If you want to return all outputs, set output_multiple=True."
            )

        return outputs[0]
}

By ensuring the correct sequence of messages and proper handling of tool_calls, you can resolve the openai.BadRequestError with error code 400 [1][2][3].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

logan-markewich commented 1 month ago

hmm, if you run memory.get() on the loaded memory, what do you see?

I could have sworn I made a fix recently so that the memory buffer would include tool/assistant message pairs

anantgupta129 commented 1 month ago

@logan-markewich if I do chat_store.json() or agent.chat_history, the complete message is there, including tool calls. I even checked the stored JSON manually, and it is correct. However, I am facing this error when the chat becomes longer. As I mentioned, this error only occurs when I am loading the chat history on each request, not when running it in a while loop

bllchmbrs commented 1 month ago

I am receiving this error as well @logan-markewich . I am using the redis chat store and the openai agent. It looks like someone on reddit is getting it too.

This happens when I have > 5 messages and several tool calls.

This seems to relate to:

Any ideas or update here?

Edit: One idea I am having actually, could this relate to the token_limit in the chat memory buffer? Is there any way to check if this is the issue?

anantgupta129 commented 1 month ago

@bllchmbrs this is not related to token length or limit according to me. as when i tested this without loading chat store again which each call it was working fine check this.

And its me on reddit also

bllchmbrs commented 1 month ago

I see, so your experience is:

  1. No persisting the chat store = no problem (even with token limits)
  2. Persisting the chat store = cause error.

is that right?

anantgupta129 commented 1 month ago

@bllchmbrs no, in both the code i am using chat store the only different is what.

  1. when consuming with api we will load the key again.
  2. But in second code (no problem) the chat store is not loaded again, i'm querying in while loop .

You can also experiment this

bllchmbrs commented 1 month ago

I see, it's the reloading of the chat memory based on the key.

We have the same issue. I have to do a recovery now (clear the history, summarize it) to work around this.

logan-markewich commented 1 month ago

@bllchmbrs @anantgupta129 do both of you have the latest version of the chat store? Like I mentioned above, I'm like 86% sure this is fixed

pip show llama-index-storage-chat-store-redis will show the current version

You can update with pip install -U llama-index-chat-store-redis

Probably should also update core

pip install -U llama-index-core

bllchmbrs commented 1 month ago

I am on the latest version of both core (0.10.40) and chat-store-redis (0.1.3).

I think it's something else.

sabir012 commented 3 weeks ago

Hi team, I am also getting this error while using multiple tools under the same agent. It appears when I share the same memory buffer with multiple tools

agc129 commented 3 weeks ago

still no fix @logan-markewich

bllchmbrs commented 3 weeks ago

I tried to get a repro in a jupyter notebook (even with multiple juypter notebooks to the same store). I couldn't get it.

I am using fastapi and run_in_threadpool for requests. Could this have something to do with multi-threading?

anantgupta129 commented 3 weeks ago

@bllchmbrs I doubt this is related to fastapi cause I tested without it also, could u share the code