Token information for google gemini multimodal streaming endpoint

vandana2015 commented 1 month ago

Is your feature request related to a problem? Please describe. I want to get the token utilization for google gemini multimodal streaming endpoint (StreamGenerateContent) in which I pass an image as input. For non streaming endpoints token information is returned by gemini models, however I want to gather token utilization info for streaming endpoints

Describe the solution you'd like For openai i found here how can i calculate (https://platform.openai.com/docs/guides/vision/calculating-costs and https://community.openai.com/t/how-do-i-calculate-image-tokens-in-gpt4-vision/492318), also there are encodings for gpt models like o200k_base and I use a library like sharptoken (https://www.nuget.org/packages/SharpToken). I want something similar for gemini.

Describe alternatives you've considered There is an endpoint to calculate token for REST api (https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/get-token-count) but it is not for multimodal. Also not present in .NET SDK.

jskeet commented 1 month ago

Assigned to @meteatamel who may be able to help more, but in general we can't address aspects of the APIs themselves - only the client libraries. I would suggest asking in one of the Vertex AI support routes for API equivalents.

There's one aspect we can help with though, and that's the countTokens RPC. In Google.Cloud.AIPlatform.V1 you can use LlmUtilityServiceClient.CountTokens, or with the REST-based Google.Apis.Aiplatform.v1 package you can use service.Projects.Locations.Publishers.Models.CountTokens(...) (assuming a client called service of type AiplatformService).

vandana2015 commented 1 month ago

Thank you for your response.

Does the LlmUtilityServiceClient.CountTokens method accept multimodal request with image as input? CountTokensRequest is Request message for PredictionService . I need for GenerateContent / StreamGenerateContent.

jskeet commented 1 month ago

Sorry, as I said before, I can't really give language-agnostic, API-specific information. (There are very few APIs that the maintainers of this repo know in a detailed way - we can't know details for hundreds of APIs.) @meteatamel may be able to help, but I think it would be better to go down one of the Vertex AI support routes instead.

meteatamel commented 1 month ago

Hi @vandana2015, here's an example for CountTokens in C#: https://github.com/GoogleCloudPlatform/dotnet-docs-samples/blob/main/aiplatform/api/AIPlatform.Samples/GetTokenCount.cs

Sorry, this sample didn't show up in docs, we'll fix that.

The sample only accepts text as input, but you can change it to multimodal like this and it should work:

using Google.Cloud.AIPlatform.V1;
using System;
using System.Threading.Tasks;

public class GetTokenCount
{
    public async Task<int> CountTokens(
        string projectId = "your-project-id",
        string location = "us-central1",
        string publisher = "google",
        string model = "gemini-1.5-flash-001"
    )
    {
        var client = new LlmUtilityServiceClientBuilder
        {
            Endpoint = $"{location}-aiplatform.googleapis.com"
        }.Build();

        var request = new CountTokensRequest
        {
            Endpoint = $"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}",
            Model = $"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}",
            Contents =
            {
                new Content
                {
                    Role = "USER",
                    Parts =
                    {
                        new Part { Text = "Describe this image" },
                        new Part { FileData = new() { MimeType = "image/png", FileUri = "gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png" } }
                    }
                }
            }
        };

        var response = await client.CountTokensAsync(request);
        int tokenCount = response.TotalTokens;
        Console.WriteLine($"There are {tokenCount} tokens in the prompt.");
        return tokenCount;
    }
}

Let us know if this answers your question.

vandana2015 commented 1 month ago

Thank you! This resolves my query.

googleapis / google-cloud-dotnet

Token information for google gemini multimodal streaming endpoint #13174