Open desenfirman opened 4 months ago
I have the same problem, I think this is important for developers who are building chatbot, when I use the AI Studio GenAI API there is a context_cache feature which can be called by usage_metadata while using vertex AI I still haven't found it. Please help.
@fikrisandi @desenfirman We are working on adding billable token information when using context caching
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Is your feature request related to a problem? Please describe.
It's a little bit confusing to understand the difference between
prompt_token_count
,candidate_token_count
andtotal_token_count
as we want to reduce the token usage. Previously, we were using Gemini on Google AI Studio, which includes a cache mechanism that should reduce token usage. However, for usage in the Enterprise environment, Google recommended using a Gemini in Vertex AI as part of GCP. So, we decided to migrate from Google AI Studio to Vertex AI.However, while we were migrating from Gemini on Google AI Studio into Gemini on Vertex AI, it didn't work seamlessly as expected. First, it's good to know that Gemini on Vertex AI could reference static files uploaded into Google Cloud Storage (GCS). But, things like context caching, which we've already used on Gemini on Google AI Studio are missing.
Then, we saw on the Gemini API of Vertex AI's documentation (Source), it stated the Gemini on Vertex AI already has a caching mechanism. However, we were still in doubt about the current token usage and how it is calculated inside the Gemini on Vertex AI.
Describe the solution you'd like
A simple, variable:
total_billable_token
, which represents the total billable token during using Gemini on Vertex AI.Actually, we've read in GCP documentation about this variable and it was found on the Palm2 text model Source. However, we didn't see it on the Gemini on Vertex AI.
Something just like this (this sample is taken from another model outside Gemini):
Describe alternatives you've considered
An alternative would be a page to show daily token billable usage on the Vertex AI platform.
Additional context
This is sample output when I try to print the
usage_metadata
as part ofResponse
object, using Gemini on Vertex Ai