langchain-ai / langchainjs

πŸ¦œπŸ”— Build context-aware reasoning applications πŸ¦œπŸ”—
https://js.langchain.com/docs/
MIT License
12.15k stars 2.04k forks source link

TokenUsage is not available when `streamRunnable = true` in AgentExecutor #6106

Open louislatreille opened 1 month ago

louislatreille commented 1 month ago

Checked other resources

Example Code

import { ChatPromptTemplate } from "@langchain/core/prompts";
import { DynamicStructuredTool } from "@langchain/core/tools";
import { ChatOpenAI } from "@langchain/openai";
import { AgentExecutor, createToolCallingAgent } from "langchain/agents";
import { z } from "zod";

const calculatorSchema = z.object({
  operation: z.enum(["add", "subtract", "multiply", "divide"]).describe("The type of operation to execute."),
  number1: z.number().describe("The first number to operate on."),
  number2: z.number().describe("The second number to operate on."),
});

const calculatorTool = new DynamicStructuredTool({
  name: "calculator",
  description: "Can perform mathematical operations.",
  schema: calculatorSchema,
  func: async ({ operation, number1, number2 }) => {
    console.log(`Calculator called with ${number1} ${operation} ${number2}`);

    if (operation === "add") {
      return `${number1 + number2}`;
    } else if (operation === "subtract") {
      return `${number1 - number2}`;
    } else if (operation === "multiply") {
      return `${number1 * number2}`;
    } else if (operation === "divide") {
      return `${number1 / number2}`;
    } else {
      throw new Error("Invalid operation.");
    }
  },
});

export const calculatorAndMaterialsAgent = async () => {
  const model = new ChatOpenAI({
    model: "gpt-4o",
    apiKey: "", // API KEY HERE
    temperature: 1,
  });
  const prompt = ChatPromptTemplate.fromMessages([
    ["system", "You are a helpful assistant."],
    ["placeholder", "{chat_history}"],
    ["human", "{input}"],
    ["placeholder", "{agent_scratchpad}"],
  ]);

  const tools = [calculatorTool];
  const agent = createToolCallingAgent({ llm: model, tools, prompt });
  const agentExecutor = new AgentExecutor({ agent, tools });

  await agentExecutor.invoke(
    { input: "Hi! My name is John!", chat_history: [] },
    {
      configurable: { sessionId: "<foo>" },
      callbacks: [
        {
          handleLLMEnd: (output) => {
            /*
              When streamRunnable = false: We get something like { completionTokens: 12, promptTokens: 99, totalTokens: 111 }
              When streamRunnable = true (default value): We get undefined
            */
            console.log(output.llmOutput?.tokenUsage);
          },
        },
      ],
    }
  );

  await agentExecutor.invoke(
    { input: "What's my name?", chat_history: [] },
    {
      configurable: { sessionId: "<foo>" },
      callbacks: [
        {
          handleLLMEnd: (output) => {
            console.log(output.llmOutput?.tokenUsage);
          },
        },
      ],
    }
  );

  await agentExecutor.invoke(
    { input: "What is 35 * 74?", chat_history: [] },
    {
      configurable: { sessionId: "<foo>" },
      callbacks: [
        {
          handleLLMEnd: (output) => {
            console.log(output.llmOutput?.tokenUsage);
          },
        },
      ],
    }
  );
};

calculatorAndMaterialsAgent();

Error Message and Stack Trace (if applicable)

No response

Description

I'm using the agent executor to have LLM call tools, and when the streamRunnable parameter is set to true (which is also the default value, which made things even more confusing), the tokenUsage field is not present after having invoked the agent.

System Info

langchain@0.2.10 | MIT | deps: 16 | versions: 282 Node.js v20.2.0

dosubot[bot] commented 1 month ago

Hey there, @louislatreille! I'm here to help you with any bugs, questions, or contributions you might have for the repository. Let's tackle this together!

To ensure the tokenUsage field is available in AgentExecutor when streamRunnable is set to true, you need to make sure that the model configuration includes the necessary settings to track token usage. Specifically, you should set the streamUsage option to true.

Here is the relevant part of the code that shows the llmOutput with estimatedTokenUsage:

{
  "ops": [
    {
      "op": "add",
      "path": "/logs/ChatOpenAI/final_output",
      "value": {
        "generations": [
          [
            {
              "text": "",
              "generationInfo": {
                "prompt": 0,
                "completion": 0
              },
              "message": {
                "lc": 1,
                "type": "constructor",
                "id": [
                  "langchain_core",
                  "messages",
                  "AIMessageChunk"
                ],
                "kwargs": {
                  "content": "",
                  "additional_kwargs": {
                    "function_call": {
                      "name": "search",
                      "arguments": "{\n  \"input\": \"current weather in New York\"\n}"
                    }
                  }
                }
              }
            }
          ]
        ],
        "llmOutput": {
          "estimatedTokenUsage": {
            "promptTokens": 123,
            "completionTokens": 17,
            "totalTokens": 140
          }
        }
      }
    }
  ]
}

Additionally, the AgentRunnableSequence class in LangChain.js has a streamRunnable property that controls whether the underlying LLM is invoked in a streaming fashion. When streamRunnable is set to true, the individual LLM tokens will be available in streamLog.

Here is the relevant part of the AgentRunnableSequence class:

export class AgentRunnableSequence<
  RunInput = any,
  RunOutput = any
> extends RunnableSequence<RunInput, RunOutput> {
  streamRunnable?: boolean;

  singleAction: boolean;

  static fromRunnables<RunInput = any, RunOutput = any>(
    [first, ...runnables]: [
      RunnableLike<RunInput>,
      ...RunnableLike[],
      RunnableLike<any, RunOutput>
    ],
    config: { singleAction: boolean; streamRunnable?: boolean; name?: string }
  ): AgentRunnableSequence<RunInput, Exclude<RunOutput, Error>> {
    const sequence = RunnableSequence.from(
      [first, ...runnables],
      config.name
    ) as AgentRunnableSequence<RunInput, Exclude<RunOutput, Error>>;
    sequence.singleAction = config.singleAction;
    sequence.streamRunnable = config.streamRunnable;
    return sequence;
  }

  static isAgentRunnableSequence(x: Runnable): x is AgentRunnableSequence {
    return typeof (x as AgentRunnableSequence).singleAction === "boolean";
  }
}

To ensure the tokenUsage field is available, you need to set streamRunnable to true when initializing the AgentRunnableSequence and ensure that the model configuration includes the streamUsage option set to true. The ChatMistralAICallOptions interface includes a streamUsage property that controls whether usage data, like token counts, are included in the streamed response chunks.

Here is the relevant part of the ChatMistralAICallOptions interface:

interface MistralAICallOptions
  extends Omit<BaseLanguageModelCallOptions, "stop"> {
  response_format?: {
    type: "text" | "json_object";
  };
  tools: StructuredToolInterface[] | MistralAIToolInput[] | MistralAITool[];
  tool_choice?: MistralAIToolChoice;
  /**
   * Whether or not to include token usage in the stream.
   * @default {true}
   */
  streamUsage?: boolean;
}

export interface ChatMistralAICallOptions extends MistralAICallOptions {}

Additionally, the ChatGroqInput interface includes a streaming property that controls whether responses are streamed:

export interface ChatGroqInput extends BaseChatModelParams {
  /**
   * Whether or not to stream responses.
   */
  streaming?: boolean;
}

The _generate method in the ChatGroq class handles the token usage tracking when streaming is enabled:

override async _generate(
    messages: BaseMessage[],
    options: this["ParsedCallOptions"],
    runManager?: CallbackManagerForLLMRun
  ): Promise<ChatResult> {
    if (this.streaming) {
      const tokenUsage: TokenUsage = {};
      const stream = this._streamResponseChunks(messages, options, runManager);
      const finalChunks: Record<number, ChatGenerationChunk> = {};
      for await (const chunk of stream) {
        const index =
          (chunk.generationInfo as NewTokenIndices)?.completion ?? 0;
        if (finalChunks[index] === undefined) {
          finalChunks[index] = chunk;
        } else {
          finalChunks[index] = finalChunks[index].concat(chunk);
        }
      }
      const generations = Object.entries(finalChunks)
        .sort(([aKey], [bKey]) => parseInt(aKey, 10) - parseInt(bKey, 10))
        .map(([_, value]) => value);

      return { generations, llmOutput: { estimatedTokenUsage: tokenUsage } };
    } else {
      return this._generateNonStreaming(messages, options, runManager);
    }
  }

Therefore, to ensure the tokenUsage field is available, you need to set streamRunnable to true and ensure that streamUsage is set to true in the model configuration. The existing configuration and implementation in the provided example already handle this [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

louislatreille commented 1 month ago

@dosu I copied the example you gave me, and I don't get any llmOutput field. I think this might be a bug.

louislatreille commented 1 month ago
{
  "ops": [
    {
      "op": "add",
      "path": "/logs/ChatOpenAI/final_output",
      "value": {
        "generations": [
          [
            {
              "text": "I'm sorry, I can't assist with that. I can only provide information about materials.",
              "generationInfo": {
                "prompt": 0,
                "completion": 0,
                "finish_reason": "stop"
              },
              "message": {
                "lc": 1,
                "type": "constructor",
                "id": [
                  "langchain_core",
                  "messages",
                  "AIMessageChunk"
                ],
                "kwargs": {
                  "content": "I'm sorry, I can't assist with that. I can only provide information about materials.",
                  "additional_kwargs": {},
                  "response_metadata": {
                    "prompt": 0,
                    "completion": 0,
                    "finish_reason": "stop"
                  },
                  "tool_call_chunks": [],
                  "tool_calls": [],
                  "invalid_tool_calls": []
                }
              }
            }
          ]
        ]
      }
    },
  ]
}
Ganeshsivakumar commented 3 weeks ago

hey @louislatreille i tried out your example code with streamRunnable = true and found that output.generations contains token usage details but yeah output.llmOutput?.tokenUsage is undefined when streamRunnable = true