Closed vandana2015 closed 1 month ago
Assigned to @meteatamel who may be able to help more, but in general we can't address aspects of the APIs themselves - only the client libraries. I would suggest asking in one of the Vertex AI support routes for API equivalents.
There's one aspect we can help with though, and that's the countTokens RPC
. In Google.Cloud.AIPlatform.V1 you can use LlmUtilityServiceClient.CountTokens
, or with the REST-based Google.Apis.Aiplatform.v1 package you can use service.Projects.Locations.Publishers.Models.CountTokens(...)
(assuming a client called service
of type AiplatformService
).
Thank you for your response.
Does the LlmUtilityServiceClient.CountTokens
method accept multimodal request with image as input?
CountTokensRequest
is Request message for PredictionService
. I need for GenerateContent / StreamGenerateContent
.
Sorry, as I said before, I can't really give language-agnostic, API-specific information. (There are very few APIs that the maintainers of this repo know in a detailed way - we can't know details for hundreds of APIs.) @meteatamel may be able to help, but I think it would be better to go down one of the Vertex AI support routes instead.
Hi @vandana2015, here's an example for CountTokens in C#: https://github.com/GoogleCloudPlatform/dotnet-docs-samples/blob/main/aiplatform/api/AIPlatform.Samples/GetTokenCount.cs
Sorry, this sample didn't show up in docs, we'll fix that.
The sample only accepts text as input, but you can change it to multimodal like this and it should work:
using Google.Cloud.AIPlatform.V1;
using System;
using System.Threading.Tasks;
public class GetTokenCount
{
public async Task<int> CountTokens(
string projectId = "your-project-id",
string location = "us-central1",
string publisher = "google",
string model = "gemini-1.5-flash-001"
)
{
var client = new LlmUtilityServiceClientBuilder
{
Endpoint = $"{location}-aiplatform.googleapis.com"
}.Build();
var request = new CountTokensRequest
{
Endpoint = $"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}",
Model = $"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}",
Contents =
{
new Content
{
Role = "USER",
Parts =
{
new Part { Text = "Describe this image" },
new Part { FileData = new() { MimeType = "image/png", FileUri = "gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png" } }
}
}
}
};
var response = await client.CountTokensAsync(request);
int tokenCount = response.TotalTokens;
Console.WriteLine($"There are {tokenCount} tokens in the prompt.");
return tokenCount;
}
}
Let us know if this answers your question.
Thank you! This resolves my query.
Is your feature request related to a problem? Please describe. I want to get the token utilization for google gemini multimodal streaming endpoint (StreamGenerateContent) in which I pass an image as input. For non streaming endpoints token information is returned by gemini models, however I want to gather token utilization info for streaming endpoints
Describe the solution you'd like For openai i found here how can i calculate (https://platform.openai.com/docs/guides/vision/calculating-costs and https://community.openai.com/t/how-do-i-calculate-image-tokens-in-gpt4-vision/492318), also there are encodings for gpt models like o200k_base and I use a library like sharptoken (https://www.nuget.org/packages/SharpToken). I want something similar for gemini.
Describe alternatives you've considered There is an endpoint to calculate token for REST api (https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/get-token-count) but it is not for multimodal. Also not present in .NET SDK.