microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
20.52k stars 2.97k forks source link

Chat completion support #78

Closed NTaylorMullen closed 1 year ago

NTaylorMullen commented 1 year ago

I've been digging through the IKernel and function abstractions hoping to find a way to enable gpt-3.5-turbo APIs (chat completion) and more recently GPT-4 APIs but given ITextCompletion only takes a string as input I haven't found a way to reasonably change the bits to enable the new behavior.

alexchaomander commented 1 year ago

Thanks for the note @NTaylorMullen! Since ChatGPT introduces a new API, we have to implement a ChatCompletition API in the Kernel. We have this on our backlog and have bumped up the priority!

@shawncal @dluc ^

Stevenic commented 1 year ago

One approach to this that might work well would be to support defining prompts using OpenAI's new ChatML syntax and then have SK parse this before calling the Chat Completion API's... The Chat Completion API's currently just convert the JSON you pass them back into a ChatML based prompt so this would essentially let send almost any ChatML based prompt through the Chat Completion API's. They've said that a way to send raw ChatML is coming but not here yet...

To go along with this you would need a {{$history}} variable that formats conversation history using ChatML. so maybe {{$historyML}} or a function to convert the pairs {{historyML}} into ChatML format.

This is actually the ONLY technique I've thought of that would allow multi-shot prompts to work correctly with the new Chat Completion API's. The issue with multi-shot prompts and Chat Completion is that each shot needs to be passed in as user/assistant message pair to work, so you either need a way outside of the prompt to construct those pairs (doesn't seem like SK is setup to do that) or you need to create a single prompt will all those pairs and use ChatML to separate them,

Stevenic commented 1 year ago

Another tip I'll give you, for gpt-3.5-turbo at least, is that I would avoid sending "system" messages all together. The model will very quickly abandon them and I've gotten far better results by just including an extra "user" message containing the core system prompt.

alexchaomander commented 1 year ago

These are great tips! Thanks for sending them @Stevenic!

dluc commented 1 year ago

Using GPT turbo is reasonably simple using a connector. I think most of the friction is about persisting the chat history object inside the context, with a continuous serialization/deserialization, which is not ideal but should do the trick.

NTaylorMullen commented 1 year ago

Using GPT turbo is reasonably simple using a connector. I think most of the friction is about persisting the chat history object inside the context, with a continuous serialization/deserialization, which is not ideal but should do the trick.

Mind elaborating on how to use a connector here? Or were you referring to internal to SK?

evchaki commented 1 year ago

@NTaylorMullen , here is a PR in right now for Python with the Chat APIs. Would this work to unblock you for now?

NTaylorMullen commented 1 year ago

@NTaylorMullen , here is a PR in right now for Python with the Chat APIs. Would this work to unblock you for now?

Sadly not, we're only using the C# APIs 😢

Stevenic commented 1 year ago

As an FYI in my JS implementation (SK like but not exactly SK) I'm doing basically what @dluc suggests... I'm using a $history variable to hold the message pairs and then I parse this $history variable to reconstruct the user/assistant message pairs in my connector. Just keep in mind that your $history could have new lines \n so you'll need to account for that if parsing text. My $history object is a string array of pairs so I don't have to deal with that but I believe in C# everything is strings.

Moult-ux commented 1 year ago

Another tip I'll give you, for gpt-3.5-turbo at least, is that I would avoid sending "system" messages all together. The model will very quickly abandon them and I've gotten far better results by just including an extra "user" message containing the core system prompt.

The "system" message is usefull to prevent Prompt Injection. It's also enable prompting in the context of the system.

SOE-YoungS commented 1 year ago

I've got gpt-4 running via SK in C# (I'm building a Teams bot). However, it has no message memory or token handling yet & I've also still got to add tests.

However, before I go to far with this, I thought I should check in here to get some feedback on the implementation.

Please see this PR For more detail.

dluc commented 1 year ago

quick update: work is in progress, here's the pull request adding ChatGPT and DallE: https://github.com/microsoft/semantic-kernel/pull/161

alexchaomander commented 1 year ago

This got merged in! Closing this issue