langchain-ai / langsmith-sdk

LangSmith Client SDK Implementations
https://docs.smith.langchain.com/
MIT License
412 stars 78 forks source link

Issue: Traces don't factor in mutation of chat history for agents #706

Open segefjord opened 5 months ago

segefjord commented 5 months ago

Issue you'd like to raise.

Why we use LangSmith, and why this issue matters

Traces is the primary reason we wanted to jump onto the LangChain platform, we are currently planning to do our own hosting solution as we have specific needs and don't feel that LangServe is anywhere near the maturity we need. However, the tracing and logging on LangSmith is incredible valuable, and it's really nice to browse things in a pretty UI! We LOVE this about LangSmith. Also prompt management is very cool. But traces is the biggest selling point for us, and we get a ton of value from that.

Issue

When working with agents we have to specify the most recent user message as the "input" variable. However, if you need an agent with chat history, then you need to do some funky stuff where you first add the chat history, but excluding the current human message you wanna send, which you need to put in the "input" variable instead. Then after the the agent has been invoked with the old chat history, you can append the human message.

This results in weird traces making it seem as if the human message was added twice for the LLM call. However, this is likely just because the chat history was mutated AFTER invoking the agent. That means, in the trace it shows up twice because it's both added from the "input" variable as a HumanMessage, but then added again when mutating the chat history object AFTER invoking. So the trace does not consider the input variables at time of invocation it seems, but rather to only look at the state of the variables at the end of the chain (at which point i did add the human message long time ago, since i'm also adding system messages for each tool calling that happens after the HumanMessage, and finally the output of the agent as an AIMessage...)

Adding the AIMessage in the end has an additional weird effect where the last message of the prompt appears to be the same as the output of the LLM call, but in reality it's again for the same reason, because the chat history has been mutated AFTER the chain was invoked.

Suggestion:

Simple solution

The approach of treating the "input" variable special, as something different from a chat history is weird. It would be much more natural to juts omit the "input" variable, and just expect a ChatHistory. If you only need a single HumanMessage in that chat history, you just only add that and would get the same behaviour as with the "input" variable. Only difference is you would have to explicitely define the chat history and HumanMessage object. But it would be much easier for people to realize how you add a full blown ChatHistory if you need that then.

TL;DR

The AgentExecuter should expect working with a ChatHistory instead of a simple "input" string. This would make working with agents better for everyone.

Details

The details are hard to solve, because I think having the SystemMessages that are mutated and added into the conversation too is nice, e.g. so we can see which tools were called and have it appear in the trace, even though those were added AFTER invocation of the chain. So maybe current behaviour is fine, just slightly confusing. It would be less confusing if the HumanMessage didn't appear twice because of the "input" stuff discussed above.

Maybe a really advanced approach - but which would improve the quality of traces a lot: if traces displayed the mutations explicitely, so that we could see "ah then after the initial invocation of the chain, these messages were added at spceific states of the run". This would make traces even more valuable and intuitive, but it's more of an enhancement. Would love this very much! Then I would not have had to think hard about what was going on here with the weird traces. Took me a while and a lot of second guessing to check the mutation, and making sure this was only a problem with the traces, and not a problem with my business logic and that these effects would not appear in the actual prompts made.

segefjord commented 5 months ago

Public trace link that showcase the issue with duplication of HumanMessage (although it never makes it into the prompt actually sent to OpenAI!): https://smith.langchain.com/public/c91dccb8-0a29-4f6d-98be-bf080626abde/r

Public trace link to showcase when tools are used etc: https://smith.langchain.com/public/c54be51a-d130-42e9-9d7d-fda90fce33a0/r

segefjord commented 5 months ago

(Another improvement that would be quality of life, is if the messages were rendered with markdown in the trace, as you can see i'm attempting to log my system messages of tool usage with inline code backticks, you can see that on the second trace link i posted)

hinthornw commented 5 months ago

Hmmm it looks like you are, in fact, sending duplicate messages to the API there - hard to tell

segefjord commented 5 months ago

@hinthornw Oh? 😮 How can you be sure?

My code 🤔

async def stream_model(sid, message):
    prompt = ChatPromptTemplate.from_messages(
        [
            SystemMessage(
                "You are a helpful assistant."
            ),
            MessagesPlaceholder(variable_name="chat_history"),
            HumanMessagePromptTemplate.from_template("{input}"),
            MessagesPlaceholder(variable_name="agent_scratchpad"),
        ]
    )
    history = global_state.session[sid]["ChatHistory"]

    gpt3 = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.0)
    gpt4 = ChatOpenAI(model="gpt-4-turbo", temperature=0.0)
    gpt4o = ChatOpenAI(model="gpt-4o", temperature=0.0)

    tools = [multiply, add, exponentiate]
    agent = create_tool_calling_agent(gpt4o, tools, prompt)
    agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True).with_config(
        {"run_name": "Agent"}
    )

    chain = agent_executor
    return chain.astream_events(
        {"chat_history": history.messages, "input": message},
        version="v1",
    )

Then i consume the stream events like this:

async def StreamModel(sid, data):
    kwargs = json.loads(data)
    stream_events = await conversation.stream_model(sid, kwargs.get("message"))
    async for event in stream_events:
        if event["event"] == "on_chain_start":
            if event["name"] == "Agent":
                chain_inputs = event["data"].get("input")
                # As soon as the chain is started, we add HumanMessage as first thing!
                conversation.add_human_message(sid, chain_inputs["input"])
        elif event["event"] == "on_chain_end":
            if event["name"] == "Agent":
                final_answer = event["data"].get("output")["output"]
                conversation.add_ai_message(sid, final_answer)
        if event["event"] == "on_chat_model_stream":
            content = event["data"]["chunk"].content
            if content:
                # Empty content in the context of OpenAI means
                # that the model is asking for a tool to be invoked.
                # So we only print non-empty content
                print(content, end="|")
        elif event["event"] == "on_tool_start":
            pass
        elif event["event"] == "on_tool_end":
            conversation.add_system_message(
                sid,
                f"Used tool: `{event['name']}` with output: `{event['data'].get('output')}`",
            )

Really curious what is actually going on here.

hinthornw commented 5 months ago

I can be sure because that's what is sent there - langsmith helped you find a bug, which is part of it's purpose! :)

Hard to tell what's going on without more context in your code, but your global state is what i'd investigate if i were you.

segefjord commented 5 months ago

No but the entire reason i'm posting this here, is because i think it's a bug with LangSmith. So in this case LangSmith didn't help me catch a bug, it helped me get more confused as there actually isn't a bug in my own code. The problem seems to be about me mutating the conversation object after invoking the agent, so at the time of agent invocation, conversation object looks different than after the run has finished. LangSmith only seems to use the state of the conversation object after the run has finished.

I don't understand why this was closed.

hwchase17 commented 5 months ago

taking a look at the second trace provided in particular (https://smith.langchain.com/public/c54be51a-d130-42e9-9d7d-fda90fce33a0/r), it definitely looks like something funky is going on.

i think there are maybe two things, both caused by the same underlying issue - the mutation of the history array (which i am assuming is happening inside conversation.add_human_message, etc)

first - i think you likely are sending multiple messages to openai. but i also think langsmith is logging the mutated (after the fact) array. i think this mutation is happening because the inputs to the agent as a whole show all five messages.... do we not make a copy of the inputs?

hinthornw commented 5 months ago

the problem seems to be about me mutating the conversation object after invoking the agent, so at the time of agent invocation, conversation object looks different than after the run has finished.

Ah interesting got it.

We did stop deep copying runs before placing them on the queue because in some situations (e.g., if you have a playwright browser reference in a function input), copying crashes the program. I'll try to see if we can strike a better compromise here since obviously it's undesired to have downstream mutations impact upstream tracing. Thanks for your patience, nemzyx.

hinthornw commented 5 months ago

(Going to re-open this in case that PR didn't fix it) - Thanks again for pushing on this issue. Yesterday I went in and added "deeper" pre-queueing copying which in my testing resolved the issue where we were logging the post-mutation values - if it's still an issue after those changes, I'll try to further invesetigate