langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.71k stars 2.19k forks source link

End event contains wrong data when streaming structured output #7114

Open Stadly opened 2 weeks ago

Stadly commented 2 weeks ago

Checked other resources

Example Code

import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  modelName: "gpt-4o",
  streaming: true,
  streamUsage: true,
})
  .withStructuredOutput(
    {
      title: "Joke",
      description: "Joke to tell user.",
      type: "object",
      properties: {
        setup: {
          type: "string",
          description: "The setup for the joke",
        },
        punchline: {
          type: "string",
          description: "The joke's punchline",
        },
      },
      required: ["setup", "punchline"],
      additionalProperties: false,
    },
    {
      strict: true,
      method: "jsonSchema",
    },
  )
  .withConfig({ runName: "joke" });

const eventStream = model.streamEvents(
  "Tell me a joke about cats",
  { version: "v2" },
  { includeNames: ["joke"] },
);
for await (const event of eventStream) {
  console.log(event);
}

Error Message and Stack Trace (if applicable)

{
  "event": "on_chain_end",
  "data": {
    "output": {
      "setup": "WhyWhy wasWhy was theWhy was the catWhy was the cat sittingWhy was the cat sitting onWhy was the cat sitting on theWhy was the cat sitting on the computerWhy was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?",
      "punchline": "BecauseBecause itBecause it wantedBecause it wanted toBecause it wanted to keepBecause it wanted to keep anBecause it wanted to keep an eyeBecause it wanted to keep an eye onBecause it wanted to keep an eye on theBecause it wanted to keep an eye on the mouseBecause it wanted to keep an eye on the mouse!"
    }
  },
  "run_id": "6b7b38ba-d191-4278-a407-ce2287683205",
  "name": "joke",
  "tags": [],
  "metadata": {}
}

Description

The end event is a concatenation of all the intermediary messages, and not just the final result. If, for example, these are the streamed tokens: Why was the, the end event contains WhyWhy wasWhy was the instead of Why was the.

System Info

dosubot[bot] commented 2 weeks ago

The issue you're experiencing with the on_chain_end event containing concatenated intermediary tokens instead of the final assembled output seems to be related to how the streaming and event handling are implemented in LangChain.js. The on_chain_end event is supposed to capture only the final output after all intermediary processing steps are completed [1][2].

To address this, you might want to ensure that the event handling logic correctly accumulates and processes the streamed tokens before emitting the on_chain_end event. This involves using a mechanism to buffer the tokens and assemble them into the final output before triggering the end event. The current behavior suggests that the tokens are being concatenated without proper handling of the final assembly.

If this is indeed a bug in the library, it might require a fix in the event handling logic to ensure that only the final, correctly assembled output is included in the on_chain_end event.

To continue talking to Dosu, mention @dosu.