googleapis / python-aiplatform

A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning.
Apache License 2.0
645 stars 350 forks source link

Unclear Token Usage Metrics in Gemini on Vertex AI - Request for additional `total_billable_token` metric instead #4015

Open desenfirman opened 4 months ago

desenfirman commented 4 months ago

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Is your feature request related to a problem? Please describe.

It's a little bit confusing to understand the difference between prompt_token_count, candidate_token_count and total_token_count as we want to reduce the token usage. Previously, we were using Gemini on Google AI Studio, which includes a cache mechanism that should reduce token usage. However, for usage in the Enterprise environment, Google recommended using a Gemini in Vertex AI as part of GCP. So, we decided to migrate from Google AI Studio to Vertex AI.

However, while we were migrating from Gemini on Google AI Studio into Gemini on Vertex AI, it didn't work seamlessly as expected. First, it's good to know that Gemini on Vertex AI could reference static files uploaded into Google Cloud Storage (GCS). But, things like context caching, which we've already used on Gemini on Google AI Studio are missing.

Then, we saw on the Gemini API of Vertex AI's documentation (Source), it stated the Gemini on Vertex AI already has a caching mechanism. However, we were still in doubt about the current token usage and how it is calculated inside the Gemini on Vertex AI.

Describe the solution you'd like

A simple, variable: total_billable_token, which represents the total billable token during using Gemini on Vertex AI.

Actually, we've read in GCP documentation about this variable and it was found on the Palm2 text model Source. However, we didn't see it on the Gemini on Vertex AI.

Something just like this (this sample is taken from another model outside Gemini):

"metadata": {
    "tokenMetadata": {
      "input_token_count": {
        "total_tokens": integer,
        "total_billable_characters": integer
      },
      "output_token_count": {
        "total_tokens": integer,
        "total_billable_characters": integer
      }
    }
  }

Describe alternatives you've considered

An alternative would be a page to show daily token billable usage on the Vertex AI platform.

Additional context

This is sample output when I try to print the usage_metadata as part of Response object, using Gemini on Vertex Ai

prompt_token_count: 107305
candidates_token_count: 27
total_token_count: 107332
fikrisandi commented 4 months ago

I have the same problem, I think this is important for developers who are building chatbot, when I use the AI ​​Studio GenAI API there is a context_cache feature which can be called by usage_metadata while using vertex AI I still haven't found it. Please help.

happy-qiao commented 3 months ago

@fikrisandi @desenfirman We are working on adding billable token information when using context caching