.Net: Bug: Semantic Kernel inflates token usage of input prompt by 20% due to Unicode escaping and URL encoding of the prompt

microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps

MIT License

20.48k stars 2.97k forks source link

Describe the bug the semantic kernel package in c# is inflating the usage tokens by around 20% for my prompts because it escapes Unicode characters and URL encodes the string of the content field of the messages array.

I tested it on a prompt that has 49k input tokens based on the metadata information from the Azure OpenAI api, and when using semantic kernel it inflates the input tokens to 63k

It's very easy to see the prompt differences when it has JSON in it

Platform

OS: Windows
IDE: Visual Studio
Language: C#
Source: NuGet package version 1.15

microsoft / semantic-kernel

.Net: Bug: Semantic Kernel inflates token usage of input prompt by 20% due to Unicode escaping and URL encoding of the prompt #6964