Closed sophialagerkranspandey closed 1 month ago
I'm pretty sure that the characters are only encoded so we can print the log statement (so it shouldn't impact your logic), but adding folks to verify.
In python, I can confirm, they are unescaped before being sent to the model, this happens within the from_element method for chat, and within the _invoke_internal method for text, hence it also does not add extra tokens (although tokenization on the model side might). @sophialagerkranspandey @glorious-beard
We have protection to prevent prompt injection attacks which will encode potentially dangerous tags. If you trust the content you can change this behaviour, take a look at this sample to see the available options: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/ChatPrompts/SafeChatPrompts.cs
Closing this issue since it's handled in both C# and Python
Discussed in https://github.com/microsoft/semantic-kernel/discussions/7308
Version 1.2
....`, the function argument looks like: ```json {"input":"\u003Cp\u003EVersion 1.2\u003C/p\u003E..."} ``` Two questions: 1. Do the extra characters in escaping "<" and ">" with 5 additional characters incur extra token cost? 2. Does the function call unescape these characters before it is sent to the LLM endpoint?