Open Stadly opened 3 weeks ago
To get the estimated token usage when using stream
or streamEvents
in LangChain.js with OpenAI models, you can use the handleLLMEnd
callback to capture the token usage information. Here is an example of how you can do this:
test("Test ChatOpenAI token usage reporting for streaming calls", async () => {
process.env.LANGCHAIN_CALLBACKS_BACKGROUND = "false";
try {
let streamingTokenUsed = -1;
let nonStreamingTokenUsed = -1;
const systemPrompt = "You are a helpful assistant";
const question = "What is the color of the night sky?";
const streamingModel = new AzureChatOpenAI({
modelName: "gpt-3.5-turbo",
streaming: true,
maxRetries: 10,
maxConcurrency: 10,
temperature: 0,
topP: 0,
callbacks: [
{
handleLLMEnd: async (output) => {
streamingTokenUsed = output.llmOutput?.estimatedTokenUsage?.totalTokens;
},
handleLLMError: async (_err) => {
// Handle error
},
},
],
});
const nonStreamingModel = new AzureChatOpenAI({
modelName: "gpt-3.5-turbo",
streaming: false,
maxRetries: 10,
maxConcurrency: 10,
temperature: 0,
topP: 0,
callbacks: [
{
handleLLMEnd: async (output) => {
nonStreamingTokenUsed = output.llmOutput?.tokenUsage?.totalTokens;
},
handleLLMError: async (_err) => {
// Handle error
},
},
],
});
const [nonStreamingResult, streamingResult] = await Promise.all([
nonStreamingModel.generate([
[new SystemMessage(systemPrompt), new HumanMessage(question)],
]),
streamingModel.generate([
[new SystemMessage(systemPrompt), new HumanMessage(question)],
]),
]);
expect(streamingTokenUsed).toBeGreaterThan(-1);
if (
nonStreamingResult.generations[0][0].text ===
streamingResult.generations[0][0].text
) {
expect(streamingTokenUsed).toEqual(nonStreamingTokenUsed);
}
} finally {
process.env.LANGCHAIN_CALLBACKS_BACKGROUND = originalBackground;
}
});
In this example, the handleLLMEnd
callback is used to capture the estimatedTokenUsage
for streaming calls and tokenUsage
for non-streaming calls. This allows you to get the estimated token usage for both scenarios.
Additionally, the tokenUsage
object includes promptTokens
, completionTokens
, and totalTokens
, which are updated based on the response from the OpenAI API [1][2][3][4].
To continue talking to Dosu, mention @dosu.
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
No response
Description
I'm trying to get the token usage when using
streamEvents
. I understand that token usage is not reported when streaming with OpenAI. Langchain therefore estimates the token usage when callinginvoke
on a streaming model.I would expect the same estimation to be done when calling
stream
orstreamEvents
, making the estimated token usage available in the result.Related issues: https://github.com/langchain-ai/langchainjs/issues/2623 and https://github.com/langchain-ai/langchainjs/issues/6106
System Info