Callback for VertexAI to monitor cost and token consumption

lionelchg commented 1 year ago

Feature request

It would be nice to have a similar function as get_openai_callback() for VertexAI. This actually gives the input tokens, output tokens and cost of using OpenAI models:

with get_openai_callback() as cb:
    llm = OpenAI(temperature=0)
    chat = ChatOpenAI(temperature=0)
    emb = OpenAIEmbeddings()

    output_llm = llm("As I was saying,")
    print(output_llm)

    # System message + Human Message
    messages = [
        SystemMessage(content="You are a helpful assistant that translates English to French."),
        HumanMessage(content="Translate this sentence from English to French. I love programming.")
    ]
    output_chat = chat(messages)
    print(output_chat)

    print(cb)

I would like to have:

with get_vertexai_callback() as cb:
    llm = VertexAI(temperature=0)
    chat = ChatVertexAI(temperature=0)
    emb = VertexAIEmbeddings()

    print(llm("As I was saying,"))

    # System message + Human Message
    messages = [
        SystemMessage(content="You are a helpful assistant that translates English to French."),
        HumanMessage(content="Translate this sentence from English to French. I love programming.")
    ]
    print(chat(messages))

Motivation

I would like to monitor my usage of VertexAI models

Your contribution

I already read quite a bit the openai version of the callback but if anyone has already thought about how to do it with vertex I would be really curious :). If someone else also planned to do it we could merge efforts!

lionelchg commented 1 year ago

@lkuligin

dosubot[bot] commented 1 year ago

Answer generated by a 🤖

Answer

I found related issues that might be helpful. I did my best to summarize the solutions, but I recommend looking at the related issues yourself.

Related Issues

Open Issues

langchain callback support PALM2 GCP

No discussion on issue

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

lkuligin commented 1 year ago

It looks like a duplicate of #5904.

The only concern I have is that it would require hard-coding the pricing (plus the exact amount of tokens is unknown since Vertex SDK doens't return it) . So we can have only rough estimation. If the pricing changes or becomes more complicated, then estimations returned by the Langchain would be misleading. Maybe it's easier to add a callback that counts [approximate] amount of tokens instead.

@hwchase17 what are thoughts on this?

Capsar commented 11 months ago

How is this progressing?

I found the following on about getting the token count and billable characters: https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count

dosubot[bot] commented 8 months ago

Hi, @lionelchg,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you opened this issue to request a callback function for VertexAI to monitor cost and token consumption, similar to the existing function for OpenAI. There have been contributions and collaboration from the community, with lkuligin suggesting to add a callback that counts the approximate amount of tokens and Capsar sharing information about getting the token count and billable characters from the VertexAI documentation. The issue has been resolved with these contributions, and the feature has been finalized.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

bpafoshizle commented 6 months ago

Would be great to have this

timsteuer commented 3 months ago

I just stumbled over this thread, looking for a solution to token counting in Vertex.

After some digging I found out, there is progress in langchain, in that there is now a VertexAICallbackHandler in langchain_google_vertexai.callbacks. I figured, one can use it to count the tokens in the same manner as one can use the openai_callback with the below approach.

However, a well designed solution integrated into langchain likely needs more thinking, for instance:

Have a unified interface for the token counts (e.g. the default vertexai counter does not yield total tokens)
Ensure that multiple callbacks can run in parallel (e.g. the bedrock callback will count calls by other LLMs as successful_requests)
Remove the model identification workaround and instead get the correct model id set by the LLM itself

Anyway, I hope this helps to reanimate this feature discussion, as it would be indeed very helpful, to have such a callback.


vertexai_callback_var: ContextVar[Optional[VertexAICallbackHandler]] = ContextVar("vertexai_callback", default=None)
register_configure_hook(vertexai_callback_var, True)

class HotfixedVertextAICallback(VertexAICallbackHandler):

   def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
       # TODO this is highly experimental!
       # - as of the current version we do not get a llm_output['model_id'] with the LLMResult, thus we do not know for sure if we are called by a vertex model
       # - we assume that if the model_id is not present, and we have a 'usage_metadata' key in the generation_info, we are called by a vertex model

       if not response.llm_output and self.__check_for_metadata(response):
           super().on_llm_end(response, **kwargs)

   @property
   def total_tokens(self) -> int:
       # lock as we may have
       # - read prompt tokens to sum
       # - other thread writes to completion tokens
       # - we read completion tokens to sum
       # --> completion tokens and prompt tokens may be inconsistent

       with self._lock:
           return self.prompt_tokens + self.completion_tokens

   def __check_for_metadata(self, response: LLMResult) -> bool:
       for generations in response.generations:
           for generation in generations:
               if generation.generation_info:
                   return "usage_metadata" in generation.generation_info
       return False

@contextmanager
def get_vertexai_callback():
   cb = HotfixedVertextAICallback()

   vertexai_callback_var.set(cb)
   yield cb
   vertexai_callback_var.set(None)

 with (get_openai_callback() as openai_cb,
          get_vertexai_callback() as vertexai_cb):
           <call your chains here>

dheerajiiitv commented 2 months ago

I have created one, in case someone fight it helpful

"""Callback Handler that prints to std out."""
from contextlib import contextmanager
from contextvars import ContextVar
from typing import Any, Dict, Generator, List, Optional, Union

from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import AgentAction, AgentFinish, LLMResult

MODEL_COST_PER_1K_TOKENS = {"chat-bison": 0.0010, "text-bison": 0.0005, ""}

def get_vertexai_token_cost_for_model(model_name: str, num_characters: int) -> float:
    """
         Characters are counted by UTF-8 code points and white space is excluded from the count.
         Source: https://cloud.google.com/vertex-ai/pricing
    """
    if model_name not in MODEL_COST_PER_1K_TOKENS:
        raise ValueError(
            f"Unknown model: {model_name}. Please provide a valid VertexAI model name."
            "Known models are: " + ", ".join(MODEL_COST_PER_1K_TOKENS.keys())
        )
    return MODEL_COST_PER_1K_TOKENS[model_name] * num_characters / 1000

def count_characters_without_space(text: str) -> int:
    """
    Returns the number of characters in the given text string,
    excluding spaces.

    Parameters:
    text (str): The input text string

    Returns:
    int: The number of characters in the input string, excluding spaces
    """
    return len(text) - text.count(" ")

class VertexAICallbackHandler(BaseCallbackHandler):
    """Callback Handler that tracks VertexAI info."""

    total_tokens: int = 0
    prompt_tokens: int = 0
    completion_tokens: int = 0
    successful_requests: int = 0
    total_cost: float = 0.0
    current_prompt_token: int = 0
    model_name: str = ""

    def __repr__(self) -> str:
        return (
            f"Tokens Used: {self.total_tokens}\n"
            f"\tPrompt Tokens: {self.prompt_tokens}\n"
            f"\tCompletion Tokens: {self.completion_tokens}\n"
            f"Successful Requests: {self.successful_requests}\n"
            f"Total Cost (USD): ${self.total_cost}"
        )

    @property
    def always_verbose(self) -> bool:
        """Whether to call verbose callbacks even if verbose is False."""
        return True

    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> None:
        """Print out the prompts."""
        # Calculate for a single call to LLM
        for prompt in prompts:
            self.current_prompt_token += count_characters_without_space(prompt)
        if "ChatVertexAI" in serialized["id"]:
            self.model_name = "chat-bison"
        else:
            self.model_name = "text-bison"

    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Collect token usage."""
        completion_tokens = 0
        self.successful_requests += 1
        for generations in response.generations:
            for generation in generations:
                completion_tokens += count_characters_without_space(generation.text)

        if self.model_name in MODEL_COST_PER_1K_TOKENS:
            completion_cost = get_vertexai_token_cost_for_model(
                self.model_name, completion_tokens
            )
            prompt_cost = get_vertexai_token_cost_for_model(
                self.model_name, self.current_prompt_token
            )
            self.total_cost += prompt_cost + completion_cost

        self.total_tokens += completion_tokens + self.current_prompt_token
        self.prompt_tokens += self.current_prompt_token
        self.completion_tokens += completion_tokens

langchain-ai / langchain