langchain-ai / langchain-cohere

MIT License
31 stars 20 forks source link

support tool messages in ChatCohere #15

Closed ccurme closed 5 months ago

ccurme commented 5 months ago

Langchain today released a new tool_calls attribute on AIMessage, and these were supported by ChatCohere on release (see https://github.com/langchain-ai/langchain-cohere/pull/7).

Langchain also released a new function for building tool-calling agents that is intended to work with any chat model that supports the new tool calling feature. However, it does not yet work with ChatCohere, and users will receive ApiError: status_code: 400, body: {'message': 'invalid request: all elements in history must have a message.'} if they attempt to use it. A typical flow of messages in this agent framework might look like this:

from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
from langchain_cohere import ChatCohere

llm = ChatCohere(temperature=0)

messages = [
    HumanMessage(content="what is the value of magic_function(3)?"),
    AIMessage(
        content="",
        tool_calls=[
            {
                "name": "magic_function",
                "args": {"input": 3},
                "id": "d86e6098-21e1-44c7-8431-40cfc6d35590",
            }
        ],
    ),
    ToolMessage(
        name="magic_function",
        content="5",
        tool_call_id="d86e6098-21e1-44c7-8431-40cfc6d35590",
    ),
]

response = llm.invoke(messages)
assert isinstance(response, AIMessage)

This currently gives the above ApiError. Here we add support for processing tool-calling AI messages and ToolMessages (analogous to Cohere's tool_result field). This enables the new tool-calling agent constructor for Cohere:

from langchain.agents import AgentExecutor, create_tool_calling_agent, tool
from langchain_cohere import ChatCohere
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant"),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", "{input}"),
        MessagesPlaceholder("agent_scratchpad"),
    ]
)

model = ChatCohere()

@tool
def magic_function(input: int) -> int:
    """Applies a magic function to an input."""
    return input + 2

@tool
def get_word_length(word: str) -> int:
    """Returns the length of a word."""
    return len(word)

tools = [magic_function, get_word_length]

agent = create_tool_calling_agent(model, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke(
    {
        "input": (
            "what is the value of magic_function(3)? "
            "also, what is the length of the word 'pumpernickel'?"
        )
    }
)
> Entering new AgentExecutor chain...

Invoking: `magic_function` with `{'input': 3}`

5
Invoking: `get_word_length` with `{'word': 'pumpernickel'}`

12The value of `magic_function(3)` is 5 and the length of the word 'pumpernickel' is 12.

> Finished chain.
{'input': "what is the value of magic_function(3)? also, what is the length of the word 'pumpernickel'?",
 'output': "The value of `magic_function(3)` is 5 and the length of the word 'pumpernickel' is 12."}

Requests:

Thank you!

ccurme commented 5 months ago

If so, one big concern is around being able to show citations for our model output. These are well loved by our customers - there's an example in the react agent, but we would be returning them from the API call. Is it possible to return a parsed version of these from the agent finish?

We should be pretty flexible to customize the agents. I took a look at react_multi_hop.agent:

generate_agent_steps = (
    multi_hop_prompt(tools=tools, prompt=prompt)
    | llm.bind(stop=["\nObservation:"], raw_prompting=True)
    | CohereToolsReactAgentOutputParser()
)

agent = (
    RunnablePassthrough.assign(
        # agent_scratchpad isn't used in this chain, but added here for
        # interoperability with other chains that may require it.
        agent_scratchpad=lambda _: [],
    )
    | RunnableParallel(
        chain_input=RunnablePassthrough(), agent_steps=generate_agent_steps
    )
    | _AddCitations()
)

This wraps a react agent with a custom output parser and structures out the citations at the end. We can do something similar with the tools agent. See below for an example, where I'm using the tools from the react agent's integration test. I made some simplifications here:

  1. The multi-hop react agent is doing some formatting of citations in its grounded-answer field; I skip that and just return the source documents.
  2. The tool-calling agent out of the box struggles with multi-hop questions (i.e., calling tools in a sequence) more than the multi-hop react agent (which looks built for this kind of thing). So here I ask an easier question than in the multi-hop agent's integration test.
    
    from typing import Any, Dict, List, Union

from langchain_cohere.react_multi_hop.prompt import convert_to_documents from langchain_core.agents import AgentAction, AgentFinish from langchain_core.runnables import Runnable, RunnableParallel, RunnablePassthrough

def add_source_documents( input: Dict[str, Any] ) -> Union[List[AgentAction], AgentFinish]: agent_steps = input.get("agent_steps", []) if not agent_steps:

The input wasn't as expected.

    return []

if not isinstance(agent_steps, AgentFinish):
    # We're not on the AgentFinish step.
    return agent_steps
agent_finish = agent_steps

# Build a list of documents from the intermediate_steps used in this chain.
intermediate_steps = input.get("chain_input", {}).get("intermediate_steps", [])
documents: List[Mapping] = []
for _, observation in intermediate_steps:
    documents.extend(convert_to_documents(observation))
agent_finish.return_values["documents"] = documents

return agent_finish

prompt = ChatPromptTemplate.from_messages( [ ("human", "{input}"), MessagesPlaceholder("agent_scratchpad"), ] )

agent = create_tool_calling_agent(llm, tools, prompt)

wrapped_agent = ( RunnableParallel( chain_input=RunnablePassthrough(), agent_steps=agent ) | add_source_documents )

agent_executor = AgentExecutor(agent=wrapped_agent, tools=tools, verbose=True) agent_executor.invoke( { "input": "What company today was founded as 'Sound of Music'?" } )

Entering new AgentExecutor chain...

Invoking: custom_internet_search with {'query': 'company founded as Sound of Music'}

[{'URL': 'https://www.cnbc.com/2015/05/26/19-famous-companies-that-originally-had-different-names.html', 'title': '19 famous companies that originally had different names', 'text': 'Sound of Music made more money during this "best buy" four-day sale than it did in a typical month – thus, the store was renamed to Best Buy in 1983...'}, {'URL': 'https://en.wikipedia.org/wiki/The_Sound_of_Music_(film)', 'title': 'The Sound of Music (film) - Wikipedia', 'text': 'In 1966, American Express created the first Sound of Music guided tour in Salzburg...'}]The company founded as 'Sound of Music' is now known as Best Buy.

Finished chain. {'input': "What company today was founded as 'Sound of Music'?", 'output': "The company founded as 'Sound of Music' is now known as Best Buy.", 'documents': [{'URL': 'https://www.cnbc.com/2015/05/26/19-famous-companies-that-originally-had-different-names.html', 'title': '19 famous companies that originally had different names', 'text': 'Sound of Music made more money during this "best buy" four-day sale than it did in a typical month – thus, the store was renamed to Best Buy in 1983...'}, {'URL': 'https://en.wikipedia.org/wiki/The_Sound_of_Music_(film)', 'title': 'The Sound of Music (film) - Wikipedia', 'text': 'In 1966, American Express created the first Sound of Music guided tour in Salzburg...'}]}


If the citations are coming from the API, we should be able to plumb them through, but depending on the specifics it may not make sense to use the `create_tool_calling_agent` constructor. The constructor itself is fairly [lightweight](https://github.com/langchain-ai/langchain/blob/450c458f8f07f1a1493a13a7b29f17b84820f90d/libs/langchain/langchain/agents/tool_calling_agent/base.py#L14) and we don't intend to discourage custom constructors-- we want Langchain to be flexible enough so that it's easy to build what you need.

Ultimately we'd like to enable users to use the built-in constructor (and the human -> AI -> tool call -> tool message pattern more generally) while keeping it easy to support your specific product features.

harry-cohere commented 5 months ago

Thanks for the detailed explanation @ccurme - that sounds agreeable. It would be great to one day have first class support for citations, or more flexibility for product features to shine through a single constructor, but I understand the approach today.

Once you're happy to merge this I plan on exposing a citation parser like you described - perhaps they could be released in 0.1.4 together?

I also merged main and resolved the conflict - hope you don't mind!