langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.04k stars 14.65k forks source link

Base64 images get truncated using AgentExecutor with create_openai_tools_agent #21967

Open Victorvexcel opened 3 months ago

Victorvexcel commented 3 months ago

Checked other resources

Example Code

import base64
from io import BytesIO
import requests

from PIL import Image
from langchain.tools import tool
from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder

@tool
def download_image(url:str,
                   local_save_path:str="/home/victor/Desktop/image_llm.jpeg") -> str:

    """Downloads and returns an image given a url as parameter"""

    try:
        # Send a HTTP request to the URL
        response = requests.get(url, stream=True)
        # Check if the request was successful
        response.raise_for_status()

        img_content = response.content
        image_stream = BytesIO(img_content)
        pil_image = Image.open(image_stream)
        pil_image.save(local_save_path)
        buffered = BytesIO()
        pil_image.save(buffered, format='JPEG', quality=85)
        base64_image = base64.b64encode(buffered.getvalue()).decode()
        src = f"data:image/jpeg;base64,{base64_image}"
        print(len(src))
        return src

    except requests.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"An error occurred: {err}")

tools = [download_image]

llm = ChatOpenAI(temperature=0,
                             model='gpt-4-turbo',
                             api_key='YOUR_API_KEY)

template_messages = [SystemMessage(content="You are helpful assistante"),
                     MessagesPlaceholder(variable_name='chat_history', optional=True),
                     HumanMessagePromptTemplate.from_template("{user_input}"),
                     MessagesPlaceholder(variable_name='agent_scratchpad')]
prompt = ChatPromptTemplate.from_messages(template_messages)

agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, 
                               tools=tools,
                               verbose=True,
                               max_iterations=3)
def ask_agent(user_input,
              chat_history,
              agent_executor):

    agent_response = agent_executor.invoke({"user_input": user_input, "chat_history": chat_history})
    print(len(agent_response["output"]))
    return agent_response["output"]

if __name__ == '__main__':

    user_input = "Please show the following image: https://upload.wikimedia.org/wikipedia/commons/1/1e/Demonstrations_in_Victoria6.jpg"
    chat_history = []
    agent_response = ask_agent(user_input,
                               chat_history,
                               agent_executor)

Error Message and Stack Trace (if applicable)

No response

Description

I am building a chat langchain agent powered by openai models. The agent is part of the backend of a web app that has a frontend where the user can interact with the agent.

The goal of this agent is to do some tool calling when the user message requires to do so. Some of the tools require to download images and send them to frontend so the user can visualize them. This process is done by encoding the images with base64, so that they are displayed correctly to the user.

The problem I am facing is that base64 image gets truncated when the agent finishes the chain and returns the answer. As an example, the base64 image that is downloaded by download_image has a length of 54443, while the answer returned by the agent has a length of 5762. This means that the image gets truncated by the agent. I am not completely sure why this happens, but maybe it is related with the maximum number of tokens that the agent can handle.

Some alternatives that I have tried, but failed to make this work:

I guess I could get into more low level stuff and try to override some default configuration of the agent, but first I wanted to ask for help to solve this problem.

System Info

langchain==0.1.20 langchain-community==0.0.38 langchain-core==0.1.52 langchain-openai==0.1.6 langchain-text-splitters==0.0.1

platform: Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal

Python 3.8.10

christian-bromann commented 3 months ago

I am having the same issue using the following code:

const tools = [new TavilySearchResults({ maxResults: 1 })]

const prompt = await pull<ChatPromptTemplate>(
  'hwchase17/openai-tools-agent',
)

const llm = new ChatOpenAI({
  model: 'gpt-4o',
  temperature: 0,
  apiKey: process.env.OPENAPI_KEY,
})

const agent = createToolCallingAgent({
  llm,
  tools,
  // streamRunnable: true,
  prompt,
})

const agentExecutor = new AgentExecutor({
  agent,
  tools,
  // returnIntermediateSteps: true,
  // verbose: true,
})

const input = `...`

const result = await agentExecutor.stream(
  { input },
  { maxConcurrency: 100 }
)

The input length is in my example: 43046 characters and the output gets truncated after around 20500 characters (sometimes more sometimes less, e.g. 20389, 21373 or 20443`). Any feedback would be appreciated.