.Net: Model Context Size Parameter for Configuration and Intelligent Planner Selection

microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps

https://aka.ms/semantic-kernel

MIT License

21.87k stars 3.26k forks source link

.Net: Model Context Size Parameter for Configuration and Intelligent Planner Selection #3244

Closed KSemenenko closed 7 months ago

KSemenenko commented 1 year ago

Description: I am proposing a new feature that involves adding a parameter to specify the model's context size within our system's configuration. This enhancement aims to allow our Planner to make intelligent model selections based on the context size requirement of a particular task.

Cost Optimization: By having the flexibility to choose the model's context size, the Planner can utilize smaller, more cost-effective models such as ChatGPT3.5 4k for shorter tasks. This approach will lead to significant cost savings. Conversely, for tasks with a larger context, the Planner can seamlessly opt for models with a larger context size, like the 16k tokens.

Also maybe it’s possible to associate certain semantic functions with a specific model?

matthewbolanos commented 1 year ago

Thanks for raising this, @KSemenenko, we're going to discuss this a bit more internally before we decide on the best direction for solving this. We definitely agree this is something we should support.

markwallace-microsoft commented 1 year ago

@KSemenenko we are doing some work to allow the AI service and associated model request settings to be dynamically configured when a semantic function (aka LLM prompt) is executed. We will allow for multiple different model request settings to be configured for a prompt e.g. for service identified by an id you can set different request settings (max tokens, frequency penalty, ...). The model request settings can be for an OpenAI model or any arbitrary LLM.

I'd like to get more information on you use case. Consider the following:

gpt-3.5-turbo has max tokens of 4,097
gpt-3.5-turbo-16k has max tokens of 16,385 tokens

Do you want to be able to say if the prompt token count is less then say 1000 tokens then use gpt-3.5-turbo and otherwise use gpt-3.5-turbo-16k?

KSemenenko commented 1 year ago

@markwallace-microsoft yes, you are absolutely right, while the chat is small, there is no point in switching to 16k, especially if it costs the gpt4 model. for optimal use of the budget.

And I have another idea, maybe we can tell the planner which models he can use for specific functions-skills-plugins.

See for example, you have a model that you have fine-tuned (or maybe train) for some specific task, and it will work perfectly with a specific function, but for other things it is not so good.

Or, for example, a task such as summarization can always work on gpt3.5, although for ordinary tasks you will find gpt4.

what do you think?

markwallace-microsoft commented 1 year ago

@KSemenenko thanks for the feedback.

Could you take a look at this PR https://github.com/microsoft/semantic-kernel/pull/3040

It includes two examples:

Shows how to specify the AI service a particular prompt uses which should address your requirement to allow a prompt to be used with a specific service.
Shows how to create a custom AI service selector, the sample uses token counts and the size of the prompt as the basis for deciding.

Your feedback would be much appreciated.

I need to look into how to be able to specify the AI service for a plan. Will update here when I have that information.

KSemenenko commented 1 year ago

I like the idea of ServiceId, it also looks well with IAIServiceSelector where you choose a model.

markwallace-microsoft commented 7 months ago

All .Net issues prior to 1-Dec-2023 are being closed. Please re-open, if this issue is still relevant to the .Net Semantic Kernel 1.x release. In the future all issues that are inactive for more than 90 days will be labelled as 'stale' and closed 14 days later.