Closed jakubbober closed 7 months ago
Hello I had the same problem with AgentExecutor, I wanted to track the use of tokens, in the end I had to create my own callback that extends BaseCallbackHandler, based on OpenAICallbackHandler, for the moment it works fine for me.
you need to install tiktoken and import the function get_openai_token_token_cost_for_model in the documentation I found more information about the callbacks
class MyTokenTrackingHandler(BaseCallbackHandler):
total_tokens: int = 0
prompt_tokens: int = 0
completion_tokens: int = 0
successful_requests: int = 0
total_cost: float = 0.0
model_name: str = ""
base_model_name: str = ""
price_per_1k_tokens: float = 0.0
price_per_1k_tokens_completion: float = 0.0
def __init__(self) -> None:
super().__init__()
def __repr__(self) -> str:
return (
f"Tokens Used: {self.total_tokens}\n"
f"\tPrompt Tokens: {self.prompt_tokens}\n"
f"\tCompletion Tokens: {self.completion_tokens}\n"
f"Successful Requests: {self.successful_requests}\n"
f"Total Cost (USD): ${self.total_cost}"
)
def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str] | str | Dict[str, Any], **kwargs: Any) -> None:
"""Acciones al inicio del LLM."""
self.model_name=serialized['kwargs']['model_name']
self.base_model_name = "gpt-4" if "gpt-4" in self.model_name else self.model_name.rpartition("-")[0]
self.prompt_tokens = num_tokens_from_string(prompts, self.base_model_name)
self.price_per_1k_tokens = get_openai_token_cost_for_model(self.model_name, self.prompt_tokens)
# Lógica adicional al inicio, si es necesario
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Acciones al finalizar el LLM."""
# Aquí asumimos que `response` tiene un campo o método que nos da el uso de tokens
# Deberías ajustar esta parte según cómo tu API o estructura de datos provea esta información
if response.generations:
for generation in response.generations[0]: # Asumiendo que siempre hay al menos una generación
aimessage = generation.message # Acceder al AIMessage
self.completion_tokens = num_tokens_from_string(aimessage.content, self.base_model_name)
self.price_per_1k_tokens_completion = get_openai_token_cost_for_model(self.model_name,self.completion_tokens, is_completion=True)
self.total_cost += self.price_per_1k_tokens + self.price_per_1k_tokens_completion
self.total_tokens += self.prompt_tokens + self.completion_tokens
self.successful_requests += 1
Thanks @Dannkol! Is the num_tokens_from_string
function also custom made by you?
Thanks @Dannkol! Is the
num_tokens_from_string
function also custom made by you?
Okay I see, it's from the OpenAI Cookbook :)
Yes @jakubbober, I was guided to know how to use the tiktoken OpenAI Cookbook, I use the name of the model to know the encode self.base_model_name
, I also use the callback in this way
cb = MyTokenTrackingHandler()
llm = ChatOpenAI(model_name=model, temperature=0.9, callbacks=[cb])
# Tools
# Angent
agent = AgentExecutor(agent=agent_chain, tools=tools, verbose=debug, handle_parsing_errors=True, callbacks=[cb])
res =agent.invoke({"input": query, "chat_history": []})
# Example
callback_info = {
'Token_used': cb.total_tokens,
'Prompt_Tooken': cb.prompt_tokens,
'Completion_Token': cb.completion_tokens,
'Successful_Request': cb.successful_requests,
'total_cost': cb.total_cost
}
This is my first issue and I apologize if I make any mistakes ✨.
Thanks for the code. However, I'm still getting no usage when printing the callback after trying to run the agent:
Tokens Used: 0
Prompt Tokens: 0
Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0
@jakubbober When I was doing that I had to use prints in the callback methods to identify errors manually because when an error occurs in on_llm_start
or on_llm_end
it returns the default values. I hope to be of help
duplicate of: https://github.com/langchain-ai/langchain/issues/16798
Hello I had the same problem with AgentExecutor, I wanted to track the use of tokens, in the end I had to create my own callback that extends BaseCallbackHandler, based on OpenAICallbackHandler, for the moment it works fine for me.
you need to install tiktoken and import the function get_openai_token_token_cost_for_model in the documentation I found more information about the callbacks
class MyTokenTrackingHandler(BaseCallbackHandler): total_tokens: int = 0 prompt_tokens: int = 0 completion_tokens: int = 0 successful_requests: int = 0 total_cost: float = 0.0 model_name: str = "" base_model_name: str = "" price_per_1k_tokens: float = 0.0 price_per_1k_tokens_completion: float = 0.0 def __init__(self) -> None: super().__init__() def __repr__(self) -> str: return ( f"Tokens Used: {self.total_tokens}\n" f"\tPrompt Tokens: {self.prompt_tokens}\n" f"\tCompletion Tokens: {self.completion_tokens}\n" f"Successful Requests: {self.successful_requests}\n" f"Total Cost (USD): ${self.total_cost}" ) def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str] | str | Dict[str, Any], **kwargs: Any) -> None: """Acciones al inicio del LLM.""" self.model_name=serialized['kwargs']['model_name'] self.base_model_name = "gpt-4" if "gpt-4" in self.model_name else self.model_name.rpartition("-")[0] self.prompt_tokens = num_tokens_from_string(prompts, self.base_model_name) self.price_per_1k_tokens = get_openai_token_cost_for_model(self.model_name, self.prompt_tokens) # Lógica adicional al inicio, si es necesario def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None: """Acciones al finalizar el LLM.""" # Aquí asumimos que `response` tiene un campo o método que nos da el uso de tokens # Deberías ajustar esta parte según cómo tu API o estructura de datos provea esta información if response.generations: for generation in response.generations[0]: # Asumiendo que siempre hay al menos una generación aimessage = generation.message # Acceder al AIMessage self.completion_tokens = num_tokens_from_string(aimessage.content, self.base_model_name) self.price_per_1k_tokens_completion = get_openai_token_cost_for_model(self.model_name,self.completion_tokens, is_completion=True) self.total_cost += self.price_per_1k_tokens + self.price_per_1k_tokens_completion self.total_tokens += self.prompt_tokens + self.completion_tokens self.successful_requests += 1
Hello everyone,
in @Dannkol 's reply I have changed
self.model_name = serialized['kwargs']['model_name']
to
self.model_name = serialized['kwargs']['model']
otherwise, it would not start counting for me.
Also the counting as is, is not right the total score is having a bit more than sum of prompt and completion tokens so in _on_llmstart, I've added this change:
self.prompt_tokens_increment = num_tokens_from_string(prompts[0], self.base_model_name)
self.prompt_tokens += self.prompt_tokens_increment
and in _on_llmend I've changed total tokens to:
self.total_tokens += self.prompt_tokens_increment + self.completion_tokens
Now the numbers are adding up.
NOTE: Don't forget to initialize _prompt_tokensincrement in the beginning of class.
Hope it helps.
Hello, This still does not work with _create_pandas_dataframeagent
Does anyone have a solution for this?
I saw another guy make the custome callback for that : ` import threading from contextlib import contextmanager from typing import Any, Generator import tiktoken from langchain_community.callbacks.manager import openai_callback_var from langchain_community.callbacks.openai_info import standardize_model_name, MODEL_COST_PER_1K_TOKENS, \ get_openai_token_cost_for_model, OpenAICallbackHandler from langchain_core.outputs import LLMResult from langchain.agents.agent_types import AgentType from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent from langchain_openai import ChatOpenAI class CostTrackerCallback(OpenAICallbackHandler):
def __init__(self, model_name: str) -> None:
super().__init__()
self.model_name = model_name
self._lock = threading.Lock()
def on_llm_start(
self,
serialized: dict[str, Any],
prompts: list[str],
**kwargs: Any,
) -> None:
encoding = tiktoken.get_encoding("cl100k_base")
prompts_string = ''.join(prompts)
self.prompt_tokens = len(encoding.encode(prompts_string))
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Run when chain ends running."""
text_response = response.generations[0][0].text
encoding = tiktoken.get_encoding("cl100k_base")
self.completion_tokens = len(encoding.encode(text_response))
model_name = standardize_model_name(self.model_name)
if model_name in MODEL_COST_PER_1K_TOKENS:
completion_cost = get_openai_token_cost_for_model(
model_name, self.completion_tokens, is_completion=True
)
prompt_cost = get_openai_token_cost_for_model(model_name, self.prompt_tokens)
else:
completion_cost = 0
prompt_cost = 0
# update shared state behind lock
with self._lock:
self.total_cost += prompt_cost + completion_cost
self.total_tokens = self.prompt_tokens + self.completion_tokens
self.successful_requests += 1
@contextmanager def custome_callback(model_name:str = default_model) -> Generator[CostTrackerCallback, None, None]: ''' Custom callback managger for pandas - agent''' cb = CostTrackerCallback(model_name) openai_callback_var.set(cb) yield cb openai_callback_var.set(None)
and the use it like the normal callback
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
No response
Description
I am writing a simple RAG application with Langchain Tools, function calling and Langchain Agents. I want to monitor token usage for the agent. I want to use the Langchain Callbacks for that purpose. I see that the
OpenAICallbackHandler
properly monitors token usage for chat models directly, but it doesn't monitor the usage statistics for agents at all. It should be possible since theAzureChatOpenAI
is passed to the agent. I tried defining the callback both in the agent and in the chat model, but after invoking the agent there usage statistics are not saved in any of the callbacks whatsoever.I think this may be because the
OpenAICallbackHandler
implements theon_llm_end
method, but not theon_chain_end
method, which seems to be the method that the agent callbacks interact with (source). I wanted to define a custom callback handler that would extendOpenAICallbackHandler
and mapon_llm_end
toon_chain_end
, but this is not straightforward, or even not doable (theLLMResult
instance used inon_llm_end
seems to be lost inside the interaction between the Agent and the chat model, which disables access to the "token_usage" property).Can token usage somehow be monitored when working with Langchain Agents?
System Info
langchain==0.1.9 langchain-openai==0.0.7 langchainhub==0.1.14 pydantic==1.10.13