dynamic model selection proposal

What feature would you like to be added?

What: a module that allows to dynamically select a model based on the capabilities of the model and the requirements of the inference

Prompt Classification Based Model Selection

Premise/Customer Need: Developers of intelligent applications want to be able to dynamically select the most appropriate AI model for their needs based on a few criteria, such as cost, latency, response quality, token size, language, and modality requirements. They may wish to optimize for different aspects of these criteria in different contexts. Developers need some way to classify workloads and match those classifications to models based upon these criteria.

Prompt Classification

Prompt Classification involves assessing the compiled (fully completed) prompt prior to passing it to an inference API for completion/chat completion. The assessment of the prompt would examine the content of the prompt wrt:

Modality (text, speech, voice, image, video) of input and desired output Language (input and desired output language) Request Classification (is it creative, reasoning, search, simple most likely next token completion, function calling, tool calling?) Data Complexity (are there sections of data that are not model instructions, and what is the complexity of the data) Token size (of request and desired response) Language Complexity (what is the linguistic complexity of the content of the instructions?) For multimodal speech, video, images, perhaps there are other complexity or classification measures?

For each of the elements above there should be a scale or taxonomy that can be mapped to a score (obv. some things like supported languages are Boolean). Classification likely involves computing an algorithm for each element of the above given the input prompt and assigning a score on the scale for each.

Model Capabilities

Model selection will also depend upon some ability to define and rank different AI models for their respective capabilities on each of the axes noted above. Additionally, the models will require definitions/rank wrt expected latency, cost, and quality

These rankings will need to be computed or determined consistently across time and stored somewhere (model catalog API?) accessible to the developer’s code.

Selecting a Model

Selecting a model then becomes a process of matching a developer’s preferences for cost, latency, quality with the capabilities of the model and the classification of the prompt. It should be noted that this is now a classic scheduling problem and lots of existing scheduling algorithms become applicable.

Off the Shelf Selection

We should build/supply some existing selection functions. The simplest would be if each of the arrays of scores above can be supplied then a function that finds the closest match. As the number of axes of comparison increase (more models, more selection heuristics) it may become wise/desirable to apply existing compute resource scheduling algorithms or tools.

Custom Selection

It seems likely that many developers will want to customize selection rules – allowing developers to write a custom selection function that receives as input the Capability ranking of the available models and the Classification of the prompt along with the preferences of the developer wrt Cost, Latency, and Quality and returns a desired model seems appropriate.

Why is this needed?

different models have different capabilities and different costs in terms of compute, latency, and capital - having model selection allows us to make smart choices about which model to use in a given situation.

microsoft / autogen