microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
20.84k stars 3.03k forks source link

.Net OpenAI - Add Usage information for Streaming #6826

Open RogerBarreto opened 1 month ago

RogerBarreto commented 1 month ago

Recently OpenAI added a stream_options.include_usage = true parameter that when set provide one last chunk with the Usage information, this can be set on by default in our connector for text streaming APIs.

https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options

Last chunk info example

{
    "id": "chatcmpl-9bs3D3THDTOtsjYcokah40Ub",
    "object": "chat.completion.chunk",
    "created": 1718812935,
    "model": "gpt-4o-2024-05-13",
    "system_fingerprint": "fp_f4e629d0a5",
    "choices": [],
    "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 7,
        "total_tokens": 20
    }
}
arynaq commented 1 month ago

Ah I was going crazy, wondering why this was not available

var response = chat_completion.GetStreamingChatMessageContentsAsync(history, cancellationToken: cancellationToken);
        await foreach (var chunk in response)
        {

            if (chunk.Metadata != null)
            {
                var as_json = JsonSerializer.Serialize(chunk.Metadata, new JsonSerializerOptions { WriteIndented = true });
                Console.WriteLine(as_json);
            }
            if (chunk.Content == null)
            {
                continue;
            }

            if (chunk.Content.Length > 0)
            {
                yield return new ServerSideEvent("message", chunk.Content, Guid.NewGuid().ToString(), "1000");
            }
        }

Expected the usage metadata to be there, but this is not implemented yet, in the meantime is there any way we can get this in a streaming mode? It is quite important to track usage on a per-user model in our app.

AdaTheDev commented 2 weeks ago

@arynaq Don't believe there is - I ended up created a temporary wrapper connector around the official OpenAI connector, that then calculates the token usage manually using Tiktoken.