get_openai_callback does not returns the tokens consumed when we stream the output from LLM

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.

Example Code

with get_openai_callback() as cb for chunk in llm.stream(message): print(chunk) print(cb)

Description

I am trying to get the token consumes details for a call but it is returning me

Tokens Used: 0 Prompt Tokens: 0 Completion Tokens: 0 Successful Requests: 0

System Info

langchain latest openai latest

Related Components

[X] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[X] Callbacks/Tracing
[ ] Async

🤖

Hello @shrimad-mishra-cognoai! :wave: I'm Dosu, a friendly bot here to assist you in solving bugs, answering questions, and making you a contributing champ while we wait for a human maintainer. Happy to have you here! Let's get started, shall we? :rocket:

Based on the information provided, it seems like you're encountering an issue where the get_openai_callback() function is not returning the expected token usage when streaming output from the language model. This is a known issue that has been discussed in the LangChain repository before.

One of the solutions that has been suggested in a similar issue (#3114) involves creating a custom async and cost calculator 'handler' that uses the tiktoken dependency to calculate the cost of tokens used. Here's the suggested code:

from langchain.callbacks.base import AsyncCallbackHandler
from langchain.schema import LLMResult
from typing import Any, Dict, List
import tiktoken

# ... rest of the code ...

class CostCalcAsyncHandler(AsyncCallbackHandler):
    model: str = ""
    socketprint = None
    websocketaction: str = "appendtext"
    token_cost_process: TokenCostProcess

    def __init__( self, model, token_cost_process ):
       self.model = model
       self.token_cost_process = token_cost_process

    def on_llm_start( self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> None:
       encoding = tiktoken.encoding_for_model( self.model )

       if self.token_cost_process == None: return

       for prompt in prompts:
          self.token_cost_process.sum_prompt_tokens( len(encoding.encode(prompt)) )

    async def on_llm_new_token(self, token: str, **kwargs) -> None:
      print( token )

      self.token_cost_process.sum_completion_tokens( 1 )

    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
      self.token_cost_process.sum_successful_requests( 1 )

This solution provides a workaround for the issue where get_openai_callback() does not return the correct credits for the ChatGPT chain when streaming is enabled.

Another similar issue (#2359) suggests that the problem may be related to a pull request and directs to look at the discussion there for a possible solution.

Please try the suggested solution and let me know if it resolves your issue. If not, could you provide more details about your use case and the exact steps you're taking when you encounter this issue? This will help me understand your situation better and provide a more accurate solution.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

langchain-ai / langchain