langchain-ai / langchain-google

MIT License
112 stars 139 forks source link

ChatAnthropicVertex + AgentExecutor => not consistent output with other models #368

Open nick-youngblut opened 4 months ago

nick-youngblut commented 4 months ago

I realize that I should have posted my issue in this repo instead of the main langchain repo.

Essentially, running ChatAnthropicVertex with the Claude models + langchain's AgentExecutor will generated differentially formatted output than other langchain Chat* functions (e.g., ChatVertexAI, ChatGroq, or ChatOpenAI). This leads to downstream errors, in which the ChatAnthropicVertex output is fed into other functions, which expect a different format than what ChatAnthropicVertex is generating.

Due to the differently formatted output generated by ChatAnthropicVertex, I effectively cannot use ChatAnthropicVertex for any langchain applications.

I'm using langchain-google-vertexai==1.0.6. My whole langchain dep list:

langchain==0.2.7
langchain-community==0.2.7
langchain-core==0.2.12
langchain-google-vertexai==1.0.6
langchain-groq==0.1.6
langchain-openai==0.1.14
langchain-text-splitters==0.2.2
langchain-weaviate==0.0.2
langchainhub==0.1.20
felipe-notilus commented 1 month ago

I am encountering the same problem and thought I may detail it here instead of opening a new thread. I can create another if needed.

In particular what I find is that when using Claude models (I am using sonnet 3.5 - claude-3-5-sonnet@20240620) wrapped in the agent executor class the output causes problems when handling the conversation memory.

I initialize the models like this:

model_pro = ChatVertexAI(model_name="gemini-1.5-pro-002", temperature=0.0, location="europe-west1") # Really important "VertexAI" does not have a bind_tools method

model_claude = ChatAnthropicVertex(model_name="claude-3-5-sonnet@20240620", project=project, location="europe-west1")

At first sight everything is ok and the outputs are consistent and they are an instance of AIMessage in both cases:

model_pro.invoke(input="hi there")

output:

AIMessage(content='Hi there! How can I help you today?\n', response_metadata={'is_blocked': False, 'safety_ratings': [], 'usage_metadata': {'prompt_token_count': 2, 'candidates_token_count': 11, 'total_token_count': 13}, 'finish_reason': 'STOP'}, id='run-52b77124-fd7a-42aa-a088-1d9a3ec25aee-0', usage_metadata={'input_tokens': 2, 'output_tokens': 11, 'total_tokens': 13})
model_claude.invoke(input="hi there")

output:

AIMessage(content='Hello! How can I assist you today? Feel free to ask me any questions or let me know if you need help with anything.', response_metadata={'id': 'msg_vrtx_01JFjAYNUVxPiG6zPxC6f5dm', 'model': 'claude-3-5-sonnet-20240620', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 9, 'output_tokens': 30}}, id='run-a4d4090d-0013-49b0-91ee-22d612453575-0', usage_metadata={'input_tokens': 9, 'output_tokens': 30, 'total_tokens': 39})

However when wrapping them in the AgentExecutor class there is a difference in the outputs:

from langchain.agents import create_tool_calling_agent
from langchain.agents import AgentExecutor

master_agent_gpro = create_tool_calling_agent(model_pro, [], prompt)
master_agent_executor_gpro = AgentExecutor(agent=master_agent_gpro, tools=[], verbose=True)
master_agent_executor_gpro.invoke({"input": "how are you"})

output:

{'input': 'how are you',
 'output': "I'm doing well, thank you for asking! How are you today?\n"}
master_agent_executor_claude = AgentExecutor(agent=master_agent_claude, tools=[], verbose=True)
master_agent_executor_claude.invoke({"input": "how are you"})

output:

{'input': 'how are you',
 'output': [{'text': "As an AI language model, I don't have feelings or personal experiences, but I'm functioning well and ready to assist you with any questions or information you need. How can I help you today?",
   'type': 'text',
   'index': 0}]}

Seeing this, not surprisingly a conversation with memory works well when using the model alone, but not when using the AgentExecutor instance of the model:

from langchain.memory import ConversationSummaryBufferMemory

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
)

chain = prompt | model_claude

demo_summary_buffer_history = ConversationSummaryBufferMemory(llm=model, max_token_limit=40, return_messages=True)

store = {}  # memory is maintained outside the chain

def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
        return store[session_id]

    memory = ConversationBufferWindowMemory(
        chat_memory=store[session_id],
        k=4,
        return_messages=True,
    )
    assert len(memory.memory_variables) == 1
    key = memory.memory_variables[0]
    messages = memory.load_memory_variables({})[key]
    store[session_id] = InMemoryChatMessageHistory(messages=messages)
    return store[session_id]

chain_with_message_history = RunnableWithMessageHistory(
    chain,
    # lambda session_id: demo_summary_buffer_history.chat_memory,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

Calling the chain_with_message_history:

chain_with_message_history.invoke({"input": "hi there"}, {"configurable": {"session_id": "unused"}})

output:

AIMessage(content="Hello! How can I assist you today? Feel free to ask any questions or let me know if there's anything you'd like help with.", response_metadata={'id': 'msg_vrtx_0151pxvP5RHDUo24gqWHo3XK', 'model': 'claude-3-5-sonnet-20240620', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 25, 'output_tokens': 32}}, id='run-731fcb10-e650-4ac8-94b7-c4fd5c3b9966-0', usage_metadata={'input_tokens': 25, 'output_tokens': 32, 'total_tokens': 57})

and again a 2nd time:

chain_with_message_history.invoke({"input": "how are you"}, {"configurable": {"session_id": "unused"}})

output:

AIMessage(content="As an AI language model, I don't have feelings, but I'm functioning well and ready to assist you with any questions or tasks you might have. How can I help you today?", response_metadata={'id': 'msg_vrtx_0134Lagkyxna8RU4YLSLfrno', 'model': 'claude-3-5-sonnet-20240620', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 63, 'output_tokens': 41}}, id='run-866b5e11-d16d-40a0-89ed-99eb97ba98b7-0', usage_metadata={'input_tokens': 63, 'output_tokens': 41, 'total_tokens': 104})

We encounter no issues and even the store content is ok:

store["unused"]
InMemoryChatMessageHistory(messages=[HumanMessage(content='hi there'), AIMessage(content="Hello! How can I assist you today? Feel free to ask any questions or let me know if there's anything you'd like help with.", response_metadata={'id': 'msg_vrtx_0151pxvP5RHDUo24gqWHo3XK', 'model': 'claude-3-5-sonnet-20240620', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 25, 'output_tokens': 32}}, id='run-731fcb10-e650-4ac8-94b7-c4fd5c3b9966-0', usage_metadata={'input_tokens': 25, 'output_tokens': 32, 'total_tokens': 57}), HumanMessage(content='how are you'), AIMessage(content="As an AI language model, I don't have feelings, but I'm functioning well and ready to assist you with any questions or tasks you might have. How can I help you today?", response_metadata={'id': 'msg_vrtx_0134Lagkyxna8RU4YLSLfrno', 'model': 'claude-3-5-sonnet-20240620', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 63, 'output_tokens': 41}}, id='run-866b5e11-d16d-40a0-89ed-99eb97ba98b7-0', usage_metadata={'input_tokens': 63, 'output_tokens': 41, 'total_tokens': 104})])

However, if we do the same with and AgentExecutor instance:

store = {}  # memory is maintained outside the chain

def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
        return store[session_id]

    memory = ConversationBufferWindowMemory(
        chat_memory=store[session_id],
        k=4,
        return_messages=True,
    )
    assert len(memory.memory_variables) == 1
    key = memory.memory_variables[0]
    messages = memory.load_memory_variables({})[key]
    store[session_id] = InMemoryChatMessageHistory(messages=messages)
    return store[session_id]

chain_with_message_history = RunnableWithMessageHistory(
    master_agent_executor_claude,
    # lambda session_id: demo_summary_buffer_history.chat_memory,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)
chain_with_message_history.invoke({"input": "hi there"}, {"configurable": {"session_id": "unused"}})

output:

{'input': 'hi there',
 'chat_history': [],
 'output': [{'text': "Hello! How can I assist you today? I'm here to help with any questions you might have or tasks you need help with. Feel free to ask about any topic, and I'll do my best to provide you with helpful information or guidance.",
   'type': 'text',
   'index': 0}]}

but calling it a 2nd time (uses the memory):

chain_with_message_history.invoke({"input": "how are you"}, {"configurable": {"session_id": "unused"}})

output:

...
ValidationError: 1 validation error for InMemoryChatMessageHistory
messages -> 1
  BaseMessage.__init__() missing 1 required positional argument: 'content' (type=type_error)

Maybe there is something else that should be done that I am not seeing but I have no such issue using the other models from vertex AI (Gemini pro or flash)

Running on python 3.10 with the following langchain dependencies:

langchain                     0.2.14
langchain-community           0.2.12
langchain-core                0.2.35
langchain-google-vertexai     1.0.10