[Bug]: BedrockConverse doesn't work with streaming and tool use combined

omrihar commented 2 months ago

Bug Description

I'm trying to stream the response from Anthropic Claude Sonnet 3.5 using BedrockConverse and I get an error: KeyError: 'toolUse'. It seems like something is not parsing correctly the tool use block. The error happens when I usellm.stream_chat_with_tools. It doesn't happen, for the same setup, when I usechat_with_tools`.

Version

llama-index core: 0.10.67, llama-index-llms-bedrock-converse==0.1.6

Steps to Reproduce

from llama_index.core.tools import FunctionTool
from llama_index.llms.bedrock_converse import BedrockConverse

llm = BedrockConverse(
    model="anthropic.claude-3-5-sonnet-20240620-v1:0",
    region_name="eu-central-1",
)

def add(x: int, y: int) -> int:
    "Add two numbers."
    return x + y

add_tool = FunctionTool.from_defaults(add)

resp = llm.stream_chat_with_tools(tools=[add_tool], user_msg="How much is 2 + 3")
for token in resp:
    if token is not None:
        print(token.delta, sep="", flush=False, end="")

Relevant Logs/Tracbacks

KeyError                                  Traceback (most recent call last)
Cell In[113], line 16
     13 add_tool = FunctionTool.from_defaults(add)
     15 resp = llm.stream_chat_with_tools(tools=[add_tool], user_msg="How much is 2 + 3")
---> 16 for token in resp:
     17     if token is not None:
     18         print(token.delta, sep="", flush=False, end="")

File ~/.pyenv/versions/training-assistant/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py:185, in llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat.<locals>.wrapped_gen()
    183 last_response = None
    184 try:
--> 185     for x in f_return_val:
    186         dispatcher.event(
    187             LLMChatInProgressEvent(
    188                 messages=messages,
   (...)
    191             )
    192         )
    193         yield cast(ChatResponse, x)

File ~/.pyenv/versions/training-assistant/lib/python3.11/site-packages/llama_index/llms/bedrock_converse/base.py:362, in BedrockConverse.stream_chat.<locals>.gen()
    348     yield ChatResponse(
    349         message=ChatMessage(
    350             role=role,
   (...)
    359         raw=response,
    360     )
    361 elif content_block_start := chunk.get("contentBlockStart"):
--> 362     tool_use = content_block_start["toolUse"]
    363     content = join_two_dicts(content, tool_use)
    364     (
    365         _,
    366         tool_calls,
    367         tool_call_ids,
    368         status,
    369     ) = self._get_content_and_tool_calls(content=content)

KeyError: 'toolUse'

dosubot[bot] commented 2 months ago

Hey there, @omrihar! I'm here to help you with any bugs, questions, or even becoming a contributor. Let's squash those bugs together! 🤖

The KeyError: 'toolUse' occurs when using llm.stream_chat_with_tools because the stream_chat method in BedrockConverse expects the toolUse key to be present in the response chunks. This key is used to handle tool usage information during streaming.

In the stream_chat method, the code attempts to access content_block_start["toolUse"]:

elif content_block_start := chunk.get("contentBlockStart"):
    tool_use = content_block_start["toolUse"]
    content = join_two_dicts(content, tool_use)
    (
        _,
        tool_calls,
        tool_call_ids,
        status,
    ) = self._get_content_and_tool_calls(content=content)

If the toolUse key is missing in the response, it will raise a KeyError.

On the other hand, the chat_with_tools method does not rely on streaming and processes the entire response at once, which might not have the same requirement for the toolUse key to be present in intermediate chunks.

To fix this issue, ensure that the response from Anthropic Claude Sonnet 3.5 includes the toolUse key in the contentBlockStart chunks when using llm.stream_chat_with_tools. The stream_chat method in the Anthropic class from the llama-index integration shows how the ToolUseBlock is handled:

for r in response:
    if isinstance(r, ContentBlockDeltaEvent):
        if isinstance(r.delta, TextDelta):
            content_delta = r.delta.text
            content += content_delta
        else:
            if not isinstance(cur_tool_call, ToolUseBlock):
                raise ValueError("Tool call not started")
            content_delta = r.delta.partial_json
            cur_tool_json += content_delta
            try:
                argument_dict = parse_partial_json(cur_tool_json)
                cur_tool_call.input = argument_dict
            except ValueError:
                pass
    elif isinstance(r, ContentBlockStartEvent):
        if isinstance(r.content_block, ToolUseBlock):
            cur_tool_call = r.content_block
            cur_tool_json = ""
    elif isinstance(r, ContentBlockStopEvent):
        if isinstance(cur_tool_call, ToolUseBlock):
            cur_tool_calls.append(cur_tool_call)

This code ensures that ToolUseBlock is properly handled during streaming, which should prevent the KeyError ^[1]^[2].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 2 months ago

I don't have access to bedrock to test, but I'd appreciate a PR :)

omrihar commented 2 months ago

@logan-markewich I don't really know how to start resolving the issue, but given some directions I can try to give it a go :) For now I don't even understand what the syntax should be to use streaming together with tool calls (do you call the tool parser on the response object from the streaming? on the last token?). As far as I can tell, there is no documentation at all about using stream_chat_with_tools...

run-llama / llama_index