OpenAICallbackHandler not counting token usage for Agents

jakubbober commented 7 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.

Example Code

from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain.tools import Tool
from langchain_community.callbacks import OpenAICallbackHandler

llm = AzureChatOpenAI(**current_settings.azureopenai_llm.dict(), temperature=0, callbacks=[OpenAICallbackHandler()])

def get_context_from_vector_store(query):
    results = VectorStoreManager(collection_name=collection_name).store.similarity_search_with_score(query, k=k)
    return results

add_db_context = Tool(
        name="add_context_documents_from_vector_store",
        func=run_qa_chain,
        description=f"Useful when you need to answer questions about the contents of the files in the vector store. Use it if you are uncertain about your answer or you don't have any hard data to support your answer",
        return_direct=False
    )

tools = [add_db_context]

agent = create_openai_tools_agent(llm=llm, tools=tools, prompt=hub.pull("hwchase17/openai-tools-agent"))
agent_executor = AgentExecutor(agent=agent, tools=tools, callbacks=[OpenAICallbackHandler()])
agent_executor.invoke({"input": "test"})

print(agent_executor.callbacks)

Error Message and Stack Trace (if applicable)

No response

Description

I am writing a simple RAG application with Langchain Tools, function calling and Langchain Agents. I want to monitor token usage for the agent. I want to use the Langchain Callbacks for that purpose. I see that the OpenAICallbackHandler properly monitors token usage for chat models directly, but it doesn't monitor the usage statistics for agents at all. It should be possible since the AzureChatOpenAI is passed to the agent. I tried defining the callback both in the agent and in the chat model, but after invoking the agent there usage statistics are not saved in any of the callbacks whatsoever.

I think this may be because the OpenAICallbackHandler implements the on_llm_end method, but not the on_chain_end method, which seems to be the method that the agent callbacks interact with (source). I wanted to define a custom callback handler that would extend OpenAICallbackHandler and map on_llm_end to on_chain_end, but this is not straightforward, or even not doable (the LLMResult instance used in on_llm_end seems to be lost inside the interaction between the Agent and the chat model, which disables access to the "token_usage" property).

Can token usage somehow be monitored when working with Langchain Agents?

System Info

langchain==0.1.9 langchain-openai==0.0.7 langchainhub==0.1.14 pydantic==1.10.13

Dannkol commented 7 months ago

Hello I had the same problem with AgentExecutor, I wanted to track the use of tokens, in the end I had to create my own callback that extends BaseCallbackHandler, based on OpenAICallbackHandler, for the moment it works fine for me.

you need to install tiktoken and import the function get_openai_token_token_cost_for_model in the documentation I found more information about the callbacks

class MyTokenTrackingHandler(BaseCallbackHandler):
    total_tokens: int = 0
    prompt_tokens: int = 0
    completion_tokens: int = 0
    successful_requests: int = 0
    total_cost: float = 0.0
    model_name: str = ""
    base_model_name: str = ""
    price_per_1k_tokens: float = 0.0
    price_per_1k_tokens_completion: float = 0.0

    def __init__(self) -> None:
        super().__init__()

    def __repr__(self) -> str:
        return (
            f"Tokens Used: {self.total_tokens}\n"
            f"\tPrompt Tokens: {self.prompt_tokens}\n"
            f"\tCompletion Tokens: {self.completion_tokens}\n"
            f"Successful Requests: {self.successful_requests}\n"
            f"Total Cost (USD): ${self.total_cost}"
        )

    def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str] | str | Dict[str, Any], **kwargs: Any) -> None:
        """Acciones al inicio del LLM."""
        self.model_name=serialized['kwargs']['model_name']
        self.base_model_name = "gpt-4" if "gpt-4" in self.model_name else self.model_name.rpartition("-")[0]
        self.prompt_tokens = num_tokens_from_string(prompts, self.base_model_name)
        self.price_per_1k_tokens = get_openai_token_cost_for_model(self.model_name, self.prompt_tokens)
        # Lógica adicional al inicio, si es necesario

    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Acciones al finalizar el LLM."""
        # Aquí asumimos que `response` tiene un campo o método que nos da el uso de tokens
        # Deberías ajustar esta parte según cómo tu API o estructura de datos provea esta información
        if response.generations:
            for generation in response.generations[0]:  # Asumiendo que siempre hay al menos una generación
                aimessage = generation.message  # Acceder al AIMessage
                self.completion_tokens =  num_tokens_from_string(aimessage.content, self.base_model_name)
                self.price_per_1k_tokens_completion = get_openai_token_cost_for_model(self.model_name,self.completion_tokens, is_completion=True)
        self.total_cost += self.price_per_1k_tokens + self.price_per_1k_tokens_completion
        self.total_tokens += self.prompt_tokens + self.completion_tokens
        self.successful_requests += 1

jakubbober commented 7 months ago

Thanks @Dannkol! Is the num_tokens_from_string function also custom made by you?

jakubbober commented 7 months ago

Thanks @Dannkol! Is the num_tokens_from_string function also custom made by you?

Okay I see, it's from the OpenAI Cookbook :)

Dannkol commented 7 months ago

Yes @jakubbober, I was guided to know how to use the tiktoken OpenAI Cookbook, I use the name of the model to know the encode self.base_model_name, I also use the callback in this way

cb = MyTokenTrackingHandler()

llm = ChatOpenAI(model_name=model, temperature=0.9, callbacks=[cb])
# Tools
# Angent

agent = AgentExecutor(agent=agent_chain, tools=tools, verbose=debug, handle_parsing_errors=True, callbacks=[cb])

res =agent.invoke({"input": query, "chat_history": []})

# Example
callback_info = {
            'Token_used': cb.total_tokens,
            'Prompt_Tooken': cb.prompt_tokens,
            'Completion_Token': cb.completion_tokens,
            'Successful_Request': cb.successful_requests,
            'total_cost': cb.total_cost
        }

This is my first issue and I apologize if I make any mistakes ✨.

jakubbober commented 7 months ago

Thanks for the code. However, I'm still getting no usage when printing the callback after trying to run the agent:

Tokens Used: 0
    Prompt Tokens: 0
    Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0

Dannkol commented 7 months ago

@jakubbober When I was doing that I had to use prints in the callback methods to identify errors manually because when an error occurs in on_llm_start or on_llm_end it returns the default values. I hope to be of help

eyurtsev commented 7 months ago

duplicate of: https://github.com/langchain-ai/langchain/issues/16798

marosjev-cde commented 5 months ago

Hello I had the same problem with AgentExecutor, I wanted to track the use of tokens, in the end I had to create my own callback that extends BaseCallbackHandler, based on OpenAICallbackHandler, for the moment it works fine for me.

you need to install tiktoken and import the function get_openai_token_token_cost_for_model in the documentation I found more information about the callbacks

class MyTokenTrackingHandler(BaseCallbackHandler):
    total_tokens: int = 0
    prompt_tokens: int = 0
    completion_tokens: int = 0
    successful_requests: int = 0
    total_cost: float = 0.0
    model_name: str = ""
    base_model_name: str = ""
    price_per_1k_tokens: float = 0.0
    price_per_1k_tokens_completion: float = 0.0

    def __init__(self) -> None:
        super().__init__()

    def __repr__(self) -> str:
        return (
            f"Tokens Used: {self.total_tokens}\n"
            f"\tPrompt Tokens: {self.prompt_tokens}\n"
            f"\tCompletion Tokens: {self.completion_tokens}\n"
            f"Successful Requests: {self.successful_requests}\n"
            f"Total Cost (USD): ${self.total_cost}"
        )

    def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str] | str | Dict[str, Any], **kwargs: Any) -> None:
        """Acciones al inicio del LLM."""
        self.model_name=serialized['kwargs']['model_name']
        self.base_model_name = "gpt-4" if "gpt-4" in self.model_name else self.model_name.rpartition("-")[0]
        self.prompt_tokens = num_tokens_from_string(prompts, self.base_model_name)
        self.price_per_1k_tokens = get_openai_token_cost_for_model(self.model_name, self.prompt_tokens)
        # Lógica adicional al inicio, si es necesario

    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Acciones al finalizar el LLM."""
        # Aquí asumimos que `response` tiene un campo o método que nos da el uso de tokens
        # Deberías ajustar esta parte según cómo tu API o estructura de datos provea esta información
        if response.generations:
            for generation in response.generations[0]:  # Asumiendo que siempre hay al menos una generación
                aimessage = generation.message  # Acceder al AIMessage
                self.completion_tokens =  num_tokens_from_string(aimessage.content, self.base_model_name)
                self.price_per_1k_tokens_completion = get_openai_token_cost_for_model(self.model_name,self.completion_tokens, is_completion=True)
        self.total_cost += self.price_per_1k_tokens + self.price_per_1k_tokens_completion
        self.total_tokens += self.prompt_tokens + self.completion_tokens
        self.successful_requests += 1

Hello everyone,

in @Dannkol 's reply I have changed

self.model_name = serialized['kwargs']['model_name']

to

self.model_name = serialized['kwargs']['model']

otherwise, it would not start counting for me.

Also the counting as is, is not right the total score is having a bit more than sum of prompt and completion tokens so in _on_llmstart, I've added this change:

self.prompt_tokens_increment = num_tokens_from_string(prompts[0], self.base_model_name)
self.prompt_tokens += self.prompt_tokens_increment

and in _on_llmend I've changed total tokens to:

self.total_tokens += self.prompt_tokens_increment + self.completion_tokens

Now the numbers are adding up.

NOTE: Don't forget to initialize _prompt_tokensincrement in the beginning of class.

Hope it helps.

MaazAr commented 4 months ago

Hello, This still does not work with _create_pandas_dataframeagent

Does anyone have a solution for this?

CrasCris commented 4 months ago

I saw another guy make the custome callback for that : ` import threading from contextlib import contextmanager from typing import Any, Generator import tiktoken from langchain_community.callbacks.manager import openai_callback_var from langchain_community.callbacks.openai_info import standardize_model_name, MODEL_COST_PER_1K_TOKENS, \ get_openai_token_cost_for_model, OpenAICallbackHandler from langchain_core.outputs import LLMResult from langchain.agents.agent_types import AgentType from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent from langchain_openai import ChatOpenAI class CostTrackerCallback(OpenAICallbackHandler):

def __init__(self, model_name: str) -> None:
    super().__init__()
    self.model_name = model_name
    self._lock = threading.Lock()

def on_llm_start(
    self,
    serialized: dict[str, Any],
    prompts: list[str],
    **kwargs: Any,
) -> None:
    encoding = tiktoken.get_encoding("cl100k_base")
    prompts_string = ''.join(prompts)
    self.prompt_tokens = len(encoding.encode(prompts_string))

def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
    """Run when chain ends running."""
    text_response = response.generations[0][0].text
    encoding = tiktoken.get_encoding("cl100k_base")
    self.completion_tokens = len(encoding.encode(text_response))
    model_name = standardize_model_name(self.model_name)
    if model_name in MODEL_COST_PER_1K_TOKENS:
        completion_cost = get_openai_token_cost_for_model(
            model_name, self.completion_tokens, is_completion=True
        )
        prompt_cost = get_openai_token_cost_for_model(model_name, self.prompt_tokens)
    else:
        completion_cost = 0
        prompt_cost = 0

    # update shared state behind lock
    with self._lock:
        self.total_cost += prompt_cost + completion_cost
        self.total_tokens = self.prompt_tokens + self.completion_tokens
        self.successful_requests += 1

@contextmanager def custome_callback(model_name:str = default_model) -> Generator[CostTrackerCallback, None, None]: ''' Custom callback managger for pandas - agent''' cb = CostTrackerCallback(model_name) openai_callback_var.set(cb) yield cb openai_callback_var.set(None)

and the use it like the normal callback

langchain-ai / langchain