microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
20.48k stars 2.97k forks source link

.Net: Bug: Semantic Kernel inflates token usage of input prompt by 20% due to Unicode escaping and URL encoding of the prompt #6964

Open Evyatar108 opened 1 week ago

Evyatar108 commented 1 week ago

Describe the bug the semantic kernel package in c# is inflating the usage tokens by around 20% for my prompts because it escapes Unicode characters and URL encodes the string of the content field of the messages array.

I tested it on a prompt that has 49k input tokens based on the metadata information from the Azure OpenAI api, and when using semantic kernel it inflates the input tokens to 63k

It's very easy to see the prompt differences when it has JSON in it

Platform

blinchi commented 1 week ago

And it's not just the consumption of tokens, it's that it costs more for the LLM to process it. When receiving encrypted data, the responses are sent to you encoded, when it should not be that way. In my case using Gemini.