[BUG] Total token and total cost not captured for llamaindex with azure openai

zhongshuai-cao commented 4 weeks ago

Bug Description Leaderboard shows 0 total token and 0 total cost, while using Llama_index and Azure OpenAI

To Reproduce When I run the leaderboard after I tested some questions, I realize the total token and total cost not working

Expected behavior leaderboard display the token and cost

Relevant Logs/Tracebacks Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks. If the issue is related to the TruLens dashboard, please also include a screenshot.

Environment:

OS: MacOS
Python Version: 3.11
TruLens version: 0.33.0
llama-index: 0.10.57

Additional context

dosubot[bot] commented 4 weeks ago

@zhongshuai-cao Hello there! 👋 I'm here to help you with any bugs, questions, or contributions while we wait for a human maintainer.

The issue of the leaderboard showing 0 total tokens and 0 total cost while using Llama_index with Azure OpenAI occurs because the AzureOpenAI class does not automatically track token usage and cost. To resolve this, you need to implement a mechanism to track these metrics manually.

Here is a possible solution:

Modify the AzureOpenAI class to include token and cost tracking:

class AzureOpenAIWithTracking(AzureOpenAI):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.total_tokens = 0
        self.total_cost = 0.0

    def query(self, *args, **kwargs):
        response = super().query(*args, **kwargs)
        self.total_tokens += response['usage']['total_tokens']
        self.total_cost += self.calculate_cost(response['usage']['total_tokens'])
        return response

    def calculate_cost(self, tokens):
        # Assuming a cost of $0.0001 per token for example purposes
        return tokens * 0.0001

# Use the modified class
llm = AzureOpenAIWithTracking(
    model="gpt-35-turbo",
    deployment_name="<your deployment>",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version=os.environ["OPENAI_API_VERSION"],
)

Update the query engine to use the new class:

query_engine = index.as_query_engine(llm=llm)

Track and display the tokens and cost after each query:

query = "What is most interesting about this essay?"
with tru_query_engine_recorder as recording:
    answer = query_engine.query(query)
    print(answer.get_formatted_sources())
    print("query was:", query)
    print("answer was:", answer)
    print("Total tokens used:", llm.total_tokens)
    print("Total cost:", llm.total_cost)

Additionally, ensure you have set up your environment and dependencies correctly as shown in the Azure OpenAI Llama Index Quickstart. This includes installing necessary packages and setting up API keys:

# Install dependencies
! pip install trulens-eval==0.21.0 llama_index==0.9.13 llama-index-llms-azure-openai llama-index-embeddings-azure-openai langchain==0.0.346 html2text==2020.1.16

# Set API keys
import os
os.environ["AZURE_OPENAI_API_KEY"] = "..." # azure
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://<your endpoint here>.openai.azure.com/" # azure
os.environ["OPENAI_API_VERSION"] = "2023-07-01-preview" # may need updating
os.environ["OPENAI_API_TYPE"] = "azure"

By implementing these changes and ensuring your setup is correct, you should be able to track and display the total tokens and cost on the leaderboard ^[1]^[2].

To continue talking to Dosu, mention @dosu.

zhongshuai-cao commented 4 weeks ago

@dosu would you provide the example of the implementation used in OpenAI to calculate the cost? I may have different models with different unit cost.

dosubot[bot] commented 4 weeks ago

To handle different models with different unit costs in the AzureOpenAIWithTracking class, you can use a dictionary to map model names to their respective unit costs. Here is an example implementation:

class AzureOpenAIWithTracking(AzureOpenAI):
    # Example model cost mapping
    MODEL_COSTS = {
        'gpt-35-turbo': 0.02,  # cost per token for gpt-35-turbo
        'text-embedding-ada-002': 0.03,  # cost per token for text-embedding-ada-002
        # Add other models and their costs here
    }

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.total_tokens = 0
        self.total_cost = 0.0

    def query(self, *args, **kwargs):
        response = super().query(*args, **kwargs)
        self.total_tokens += response['usage']['total_tokens']
        self.total_cost += self.calculate_cost(response['usage']['total_tokens'])
        return response

    def calculate_cost(self, n_tokens):
        # Get the unit cost for the current model
        unit_cost = self.MODEL_COSTS.get(self.model, 0)
        # Calculate the total cost
        total_cost = n_tokens * unit_cost
        return total_cost

# Use the modified class
llm = AzureOpenAIWithTracking(
    model="gpt-35-turbo",
    deployment_name="<your deployment>",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version=os.environ["OPENAI_API_VERSION"],
)

In this implementation:

MODEL_COSTS is a dictionary that maps model names to their respective unit costs.
The calculate_cost method retrieves the unit cost for the current model using the model attribute.
It then calculates the total cost by multiplying the number of tokens (n_tokens) by the unit cost ^[1].

To continue talking to Dosu, mention @dosu.

sfc-gh-dkurokawa commented 7 hours ago

We can't do this automatically it seems, because Azure OpenAI just doesn't seem to report this info back to us. langchain has a community way of getting around this by estimating it from the reported tokens (see here) but that has its own issues in that your langchain-community package would have to be up-to-date, the actual code of that package would have to be up-to-date, and that you specified the right model version.

All this to say, I generally don't think we should go with that kinda approach, but if there's enough desire for it we could do it I suppose.

truera / trulens

[BUG] Total token and total cost not captured for llamaindex with azure openai #1342