langchain-ai / langchain-aws

Build LangChain Applications on AWS
MIT License
95 stars 70 forks source link

Does ChatBedrock(claude-3) support returning images in ToolMessage? #75

Open maciejmajek opened 3 months ago

maciejmajek commented 3 months ago

Hi, I am working on a project that is highly multimodal dependent. Some of the implemented tools return images which are then fed into the chat model. Here is the error I am getting when the image is added into the content using method described (with small changes described below) here

Exception has occurred: ValueError
Error raised by bedrock service: An error occurred (ValidationException) when calling the InvokeModel operation: messages.2.content.0.tool_result.content.1.image.source: Field required
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: messages.2.content.0.tool_result.content.1.image.source: Field required

Here, the last message is the one causing problems

[HumanMultimodalMessage(content=[{'type': 'text', 'text': 'Hello. Can you please descr...? Remember to use the available tools.'}]), AIMessage(content='', additional_kwargs={'usage': {'prompt_tokens': 347, 'completion_...: 'toolu_bdrk_01Du8uhp4iKzvJNJc9BYyjgA'}]), ToolMessage(content=[{'type': 'text', 'text': 'Here is the image test.png'}, {'type':..._id='toolu_bdrk_01Du8uhp4iKzvJNJc9BYyjgA')]
0: HumanMultimodalMessage(content=[{'type': 'text', 'text': 'Hello. Can you please describe the contents of test.png image? Remember to use the available tools.'}])
1: AIMessage(content='', additional_kwargs={'usage': {'prompt_tokens': 347, 'completion_tokens': 56, 'total_tokens': 403}, 'stop_reason': 'tool_use', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'}, response_metadata={'usage': {'prompt_tokens': 347, 'completion_tokens': 56, 'total_tokens': 403}, 'stop_reason': 'tool_use', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'}, id='run-04b9ef27-1d6c-4ed8-9eb1-dc47cd3787ad-0', tool_calls=[{'name': 'GetImageTool', 'args': {'name': 'test.png'}, 'id': 'toolu_bdrk_01Du8uhp4iKzvJNJc9BYyjgA'}])
2: ToolMessage(content=[{'type': 'text', 'text': 'Here is the image test.png'}, {'type': 'image', 'image': {'source': 'data:image/png;base64,............')

I've changed the standard image dictionary from

{
    'type':'image_url',
    'image_url':'....'
}

to

{
    'type':'image',
    'source':'....'
}

as the previous errors hinted

Exception has occurred: ValueError
Error raised by bedrock service: An error occurred (ValidationException) when calling the InvokeModel operation: messages.2.content.0.tool_result.content.1: Input tag 'image_url' found using 'type' does not match any of the expected tags: 'text', 'image'

I've run into similar problem with ChatOpenAI: OpenAI (or at least langchain_openai) explicitely does not support images returned in ToolMessage, so I am currently splitting the tool output into two messages (ToolMessage with a text content, and HumanMessage with image) with success.

The same workaround works with bedrock

[HumanMultimodalMessage(content=[{'type': 'text', 'text': 'Hello. Can you please descr...? Remember to use the available tools.'}]), AIMessage(content='', additional_kwargs={'usage': {'prompt_tokens': 347, 'completion_...: 'toolu_bdrk_01SLKQ2dznQy2Ac93WTZ8AFn'}]), ToolMultimodalMessage(content=[{'type': 'text', 'text': 'Here is the image test.png '..._id='toolu_bdrk_01SLKQ2dznQy2Ac93WTZ8AFn'), HumanMultimodalMessage(content=[{'type': 'text', 'text': 'Image returned by a tool ca...+QyuUwuk8sGqJ8AZeonWEsAAAAASUVORK5CYII=']), AIMessage(content='The image appears to show a simple black background with no other ...n-b580351e-a370-4f27-bfbc-7bfbbd5becc7-0')]
special variables:
function variables:
0: HumanMultimodalMessage(content=[{'type': 'text', 'text': 'Hello. Can you please describe the contents of test.png image? Remember to use the available tools.'}])
1: AIMessage(content='', additional_kwargs={'usage': {'prompt_tokens': 347, 'completion_tokens': 56, 'total_tokens': 403}, 'stop_reason': 'tool_use', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'}, response_metadata={'usage': {'prompt_tokens': 347, 'completion_tokens': 56, 'total_tokens': 403}, 'stop_reason': 'tool_use', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'}, id='run-433616ec-c1b6-4dee-b5d6-476bf8aa1072-0', tool_calls=[{'name': 'GetImageTool', 'args': {'name': 'test.png'}, 'id': 'toolu_bdrk_01SLKQ2dznQy2Ac93WTZ8AFn'}])
2: ToolMultimodalMessage(content=[{'type': 'text', 'text': 'Here is the image test.png '}], tool_call_id='toolu_bdrk_01SLKQ2dznQy2Ac93WTZ8AFn')
3: HumanMultimodalMessage(content=[{'type': 'text', 'text': 'Image returned by a tool call toolu_bdrk_01SLKQ2dznQy2Ac93WTZ8AFn'}, {'type': 'image_url', 'image_url': {'url': 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAGQAAABkCAIAAAD/gAIDAAAAxUlEQVR4Ae3BAQEAAACCIP1/ugsOCOQyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8sGqJ8AZeonWEsAAAAASUVORK5CYII='}}], images=['iVBORw0KGgoAAAANSUhEUgAAAGQAAABkCAIAAAD/gAIDAAAAxUlEQVR4Ae3BAQEAAACCIP1/ugsOCOQyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8vkMrlMLpPL5DK5TC6Ty+QyuUwuk8sGqJ8AZeonWEsAAAAASUVORK5CYII='])
4: AIMessage(content='The image appears to show a simple black background with no other visible elements. It is a completely black square or rectangular image.', additional_kwargs={'usage': {'prompt_tokens': 479, 'completion_tokens': 28, 'total_tokens': 507}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'}, response_metadata={'usage': {'prompt_tokens': 479, 'completion_tokens': 28, 'total_tokens': 507}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'}, id='run-b580351e-a370-4f27-bfbc-7bfbbd5becc7-0')
3coins commented 2 months ago

@maciejmajek Can you log the actual payload sent to Bedrock and include here. It would also help to include a small sample of the code to reproduce. Briefly looking at the sample link you provided, Bedrock doesn't seem to support image_url, rather image block with these attributes as required, you seem to be missing the format attribute in the image block.

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ImageBlock.html