pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI
BSD 2-Clause "Simplified" License
3.35k stars 318 forks source link

LangchainProcessor receives system message as HumanMessage instead of SystemMessage #341

Open agilebean opened 3 months ago

agilebean commented 3 months ago

Context

Using LangchainProcessor as LLM wrapper, and InMemoryChatMessageHistory (extends BaseChatMessageHistory) as message store.

Current behavior

system_message = [
    {
        "role": "system",
        "content": "Introduce yourself as a friend"
    }
]
await task.queue_frame(LLMMessagesFrame(system_message))

is received in the chat history as:

[HumanMessage(content="Introduce yourself as a friend")]

Expected behavior:

A LLMMessagesFrame with system role should be converted to:

[SystemMessage(content="Introduce yourself as a friend")]

SystemMessage in Langchain docs:

from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(
        content="You are a helpful assistant! Your name is Bob."
    ),
    HumanMessage(
        content="What is your name?"
    )
]
# Define a chat model and invoke it with the messages
print(model.invoke(messages))
TomTom101 commented 3 months ago

The LLMMessagesFrame doesn't know or care what's inside its messages list of dict. How are you using the LangchainProcessor, what does the chain look like?

agilebean commented 3 months ago

Thanks for asking. My chain is really normal in the sense that I didn't yet try the extension of config fields as follows.

My code

    exclude_metadata = filter_messages(include_types=[HumanMessage, AIMessage, SystemMessage])
    prompted_chat_model = constants.PROMPT_TEMPLATE | exclude_metadata | chat_model
...
        chain_runnable = RunnableWithMessageHistory(
            prompted_chat_model,
            lambda session_id: get_session_history(database_label, session_id, message_store),
            history_messages_key="chat_history",
            input_messages_key="input",
            **memory_kwargs,
        )

I feed the system message as:

                system_message = [
                    {
                        "role": "system",
                        "content": SYSTEM_PROMPT
                    }
                ]
                await task.queue_frame(LLMMessagesFrame(system_message))

Suspected root cause

So imho the root cause why the above message is stored as HumanMessage is the LanchainProcessor code here:

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

        if isinstance(frame, LLMMessagesFrame):
            # Messages are accumulated by the `LLMUserResponseAggregator` in a list of messages.
            # The last one by the human is the one we want to send to the LLM.
            logger.debug(f"Got transcription frame {frame}")
            text: str = frame.messages[-1]["content"]

            await self._ainvoke(text.strip())
        else:
            await self.push_frame(frame, direction)

As can be seen, it does not retrieve the "role" but only "content" to send to the LLM. If the role is not shown, it only sends the text, I guess it defaults to HumanMessage in Langchain which would explain the result.

Langchain methods of storing SystemMessage

My understanding is that Langchain stores messages as SystemMessage if the role is specified as "system". I found 3 places that would support this thought:

  1. I searched the latest Langchain docs and found this chapter showing a message passed with system role:

    prompt2 = ChatPromptTemplate.from_messages(
    [
        ("system", "really good ai"),
        ("human", "{input}"),
        ("ai", "{ai_output}"),
        ("human", "{input2}"),
    ]
    )
    fake_llm = RunnableLambda(lambda prompt: "i am good ai")
    chain = prompt1.assign(ai_output=fake_llm) | prompt2 | fake_llm
  2. In the older langchain v0.1 docs there is this hint:

    response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful AI bot. Your name is Bob."},
        {"role": "user", "content": "Hello, how are you doing?"},
        {"role": "assistant", "content": "I'm doing well, thanks!"},
        {"role": "user", "content": "What is your name?"},
    ],
    )
  3. This github issue for storing SystemMessage shows another method.

Proposed solution

In line with the above considerations, I think the invoke method should receive a dictionary containing the role. Something along the following lines:

    async def process_frame(self, frame: Frame, direction: FrameDirection):
...
            text: str = frame.messages[-1]["content"]
            role: str = frame.messages[-1]["role"]
            message_dict = { 'role': role, 'input' : text.strip()}

            await self._ainvoke(message_dict)
TomTom101 commented 3 months ago

Understood! Now the LangchainProcessor takes a single input and populates the ChatPromptTemplate you supplied with your chain. That is why roles do note make a lot of sense here. You might define an input key transcript and use it like this in the prompt with your chain:

messages = [("system": "Here's what the user said: {transcript}")]

Input key defaults to input so more likely would be:

messages = [("system", "Be nice"), ("human", "{input}")]

So not quite sure yet how to offer the possibility to push a list of messages and basically ignore the chain altogether.

agilebean commented 3 months ago

Understood! Now the LangchainProcessor takes a single input and populates the ChatPromptTemplate you supplied with your chain.

Yes, I understood from the code that it takes the last message, and transmits the "content" property in this line:

[frame.messages[-1]["content"]](text: str = frame.messages[-1]["content"])

That is why roles do note make a lot of sense here. From the current code you are right but this is very limiting. The whole motivation from using LangchainProcessor is to use a RunnableWithMessageHistory which has the advantage of managing the message history automatically and allowing to change the system prompt without redefining the chain.

You might define an input key transcript and use it like this in the prompt with your chain:

messages = [("system": "Here's what the user said: {transcript}")]

Syntactically, this is exactly the idea. The only difference is that you are inserting the user's transcript. If we want to change the system prompt, it would be defined semantically different as:

messages = [("system": "{system_prompt}")]

Input key defaults to input so more likely would be:

messages = [("system", "Be nice"), ("human", "{input}")]

Yes, this is unfortunately the case with the current code as it only retrieves the content with

            text: str = frame.messages[-1]["content"]

However, if it would also retrieve the role attribute, the sent message could be understood as system message (and thus mapped to Langchain's SystemMessage class. Tremendous benefit. So that's why I suggested that process_frame should be extended by

            role: str = frame.messages[-1]["role"]
            message_dict = { 'role': role, 'input' : text.strip()}

So not quite sure yet how to offer the possibility to push a list of messages and basically ignore the chain altogether.

Can you specify what you mean by "ignore the chain altogether"?

I suppose you don't mean ignoring the message history. Or you meant pushing a list of messages without resending the previous chat history? This makes sense of course. And wasn't this Langchain's whole motivation to deprecate ConversationChain in favor of RunnableWithMessageHistory?

As far as I understood an engineer who is with Langchain and urged me to use RunnableWithMessageHistory, one of the many advantages is that it self-manages the message history by the get_session_history input. If you print it, you can see that all previous message are still contained, although you only sent one message.

Conclusion: I would appreciate it if you can retrieve and send the role attribute :)

agilebean commented 3 months ago

Update proposed solution

I just tried my proposed solution but got this warning:

WARNING: Error in RootListenersTracer.on_chain_end callback: ValueError('Expected str, BaseMessage, List[BaseMessage], or Tuple[BaseMessage]. Got {\'role\': \'system\', \'input\': \'Role:\\nYou are an experienced

So it appears that if LMMMessagesFrame contains a message with role system, you cannot invoke it. Found out that the root case in the LangchainProcessor code:

sync def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

        if isinstance(frame, LLMMessagesFrame):
            # Messages are accumulated by the `LLMUserResponseAggregator` in a list of messages.
            # The last one by the human is the one we want to send to the LLM.
            logger.debug(f"Got transcription frame {frame}")

            text: str = frame.messages[-1]["content"]
            await self._ainvoke(text.strip())

    async def _ainvoke(self, text: str):
        logger.debug(f"Invoking chain with {text}")
...
        try:
            async for token in self._chain.astream(
                {self._transcript_key: text},
                config={"configurable": {"session_id": self._participant_id}},
            ):

So in conclusion, the current code assumes only one way to feed Langchain's astream() which is with a text passed in LMMMessagesFrame.

Expected Behavior

The goal is to allow other inputs to astream() which are basically the reason why someone would need Langchain. One important example from the Langchain API documentation is as follows:

prompt = ChatPromptTemplate.from_messages([
    ("system", "You're an assistant who's good at {ability}"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{question}"),
])

chain = prompt | ChatAnthropic(model="claude-2")

with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history=get_session_history,
    input_messages_key="question",
    history_messages_key="history",
...
    ],
)

with_message_history.invoke(
    {"ability": "math", "question": "What does cosine mean?"},
...
)

This example demonstrates one of many use cases where the input is not text but a dictionary of input keys (question) and placeholders (ability) contained in the ChatPromptTemplate.

I would appreciate if pipecat could open up the current restrictions of LangchainProcessor to allow the full functionality of Langchain.

agilebean commented 1 month ago

Can somebody please make this fix? This is a really important fix to enable the use of Langchain's RunnableWithMessageHistory using a system prompt. Please pipecat contributors, be aware that the current code is not submitting a system prompt with every user prompt - this is crucial for many use cases however!

Exact location for the fix in langchain.py:

async for token in self._chain.astream(
    {self._transcript_key: text},
    config=configurable
):

Fix:

async for token in self._chain.astream(
    { self._transcript_key: text, "system": self._system_prompt },
    config=configurable
):
# set system prompt in LangchainProcessor init:
self._system_prompt: str | None = system_prompt

I have tested this extensively and it works well (You can test this fix by setting the system prompt as: "Start every sentence with beep-beep.").

Important Notes

I would be grateful that the pipecat contributors could acknowledge the many different use cases for which langchain is chosen. In this case, transmitting the system prompt for every message, and more importantly, changing the system prompt dynamically, are the two crucial use cases for using langchain.