Open hzeus opened 2 weeks ago
Thanks for flagging. I am able to reproduce the lag.
Looks like a disproportionate amount of time is being spent on just parsing the json values, which is something we can improve (I think by not doing redundant work, though haven't looked at it deeply yet). Thanks for bringing it to our attention.
Statistical profile from a previous run.
cprofile attached as well.
7 0.001 0.000 39.557 5.651 python3.11/sit
e-packages/langchain_core/language_models/chat_models.py:771(_agenerate_with_cache)
1 0.011 0.011 39.537 39.537 python3.11/sit
e-packages/langchain_core/language_models/chat_models.py:79(generate_from_stream)
4013 0.014 0.000 39.525 0.010 langchain_core/outputs/chat_generation.py:83(__add__)
4013 0.023 0.000 39.413 0.010 langchain_core/messages/ai.py:232(__add__)
12050/8035 0.022 0.000 39.381 0.005 pydantic/v1/main.py:332(__init__)
12050/8035 0.139 0.000 39.360 0.005 pydantic/v1/main.py:1030(validate_model)
4016 0.008 0.000 39.300 0.010 langchain_core/messages/ai.py:78(__init__)
4016 0.007 0.000 39.292 0.010 langchain_core/messages/base.py:57(__init__)
4015 0.014 0.000 38.904 0.010 langchain_core/messages/ai.py:178(init_tool_calls)
4014 11.357 0.003 38.890 0.010 langchain_core/utils/json.py:44(parse_partial_json)
2158964 2.544 0.000 27.532 0.000 python3.11/json/__init__.py:299(loads)
2158964 1.206 0.000 21.431 0.000 python3.11/json/decoder.py:332(decode)
2158964 16.069 0.000 19.615 0.000 /U
Repro script:
import asyncio
import cProfile
import io
import pstats
from datetime import datetime
from typing import Literal
from langchain.tools import tool
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import END, MessageGraph
from pyinstrument import Profiler
@tool
def file_saver(text: str) -> str:
"""Persist the given string to disk"""
pass
model = ChatOpenAI(
model="gpt-4o",
streaming=True,
temperature=0.1,
)
tools = [file_saver]
model = model.bind_tools(tools)
def get_agent_executor():
def should_continue(messages):
print(f"{datetime.now()}: Starting should_continue")
return "end"
async def call_model(messages):
response = await model.ainvoke(messages)
return response
workflow = MessageGraph()
workflow.add_node("agent", call_model)
workflow.set_entry_point("agent")
workflow.add_conditional_edges(
"agent",
should_continue,
{
"end": END,
},
)
return workflow.compile()
agent_executor = get_agent_executor()
messages = [
HumanMessage(
content="Think of a ballad with 300 verses and save it"
" to a file. Do not print it to me first. At the end of the poem, sign it as 's.b. Anonymous'."
" You must write every line, do not skip."
)
]
async def run():
saw_cme = False
py_profiler = None
cprofiler = None
evs = None
async for event in agent_executor.astream_events(messages, version="v1"):
kind = event["event"]
if saw_cme:
continue
if kind == "on_chat_model_stream":
if tc := event["data"]["chunk"].additional_kwargs.get("tool_calls"):
if "Anonymous" in str(tc[0]["function"]):
if py_profiler:
py_profiler.stop()
py_profiler = Profiler(interval=0.0001, async_mode="disabled")
py_profiler.start()
if cprofiler:
cprofiler.disable()
cprofiler = cProfile.Profile()
cprofiler.enable()
if kind == "on_chat_model_end":
saw_cme = True
if py_profiler:
py_profiler.stop()
py_profiler.write_html("profile.html", show_all=True)
cprofiler.disable()
s = io.StringIO()
ps = pstats.Stats(cprofiler, stream=s).sort_stats(
pstats.SortKey.CUMULATIVE
)
ps.print_stats()
with open("profile.txt", "w") as f:
f.write(s.getvalue())
else:
print("No profiling data")
print(evs)
evs = f"{datetime.now()}: Received event: {kind}"
if kind == "on_chat_model_end":
print(evs)
asyncio.run(run())
Going to transfer to the langchain repo however since the underlying issue is there. Will flag to the team.
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
Description
Hi!
When receiving an llm answer that leads to a tool call with a large amount of data within a parameter, we noticed that our program was blocked although we are using the async version. My guess is that the final message is built after the last message was streamed and this takes some time on the cpu? Also, is there a different approach that we could use?
Thank you very much!
System Info