[AIConfig 2.0] Chat History

Problem

AIConfig currently unnecessarily couples conversation / multi-turn chat history with the config itself. It uses 'remember_chat_context' and extracts the conversation history from previous prompts/outputs in the config. This is a confusing UX and also counterproductive with respect to aiconfig's intention of storing prompts and model settings.

The following is a high-level proposal for an improved implementation.

Proposal

AIConfig SDK

Pull multi-turn chat history into a separate abstraction above the core aiconfig layer as follows:

Implement a ChatHistory class which maintains an array of messages in openai spec (excluding deprecated function messages) to provide a common interface for model parsers to serialize/deserialize messages to/form.

Expose get_messages and add_message(s) method(s) for getting and adding to the messages list.

Note - for now, management of chat sessions can be left to the end user, e.g. by maintaining a map of session id to ChatHistory instances.

Implement an AIConfigRuntimeWithChatHistory which extends AIConfigRuntime. It should take in a ChatHistory in the constructor and maintain the ChatHistory instance. It should expose get_chat_history and update_chat_history methods to get the messages and update the messages, respectively.

Implement a ChatModelParser class which extends ParameterizedModelParser and whose deserialize and run methods take in an AIConfigRuntimeWithChatHistory.

The class should have the following additional methods:

(abstract) get_prompt_messages: given an aiconfig and prompt, serialize new message(s) to add to chat_history. These will essentially be the user (and possibly system) message(s) associated with the prompt being run
get_output_messages: given prompt outputs, serialize to assistant or tool call message

The run implementation should be something like:

   async def run(
       self,
       prompt: Prompt,
       aiconfig: AIConfigRuntimeWithChatHistory,
       options: Optional[InferenceOptions] = None,
       parameters: Dict = {},
       run_with_dependencies: Optional[bool] = False,
   ) -> List[Output]:
       if run_with_dependencies:
           return await self.run_with_dependencies(
               prompt, aiconfig, options, parameters
           )
       else:
           # Add user and possibly system messages from the prompt/config
           aiconfig.update_chat_history(
               self.get_prompt_messages(
                   prompt, aiconfig, options, parameters
               )
           )

           completion_params = await self.deserialize(
               prompt, aiconfig, parameters, aiconfig.get_chat_history()
           )

           outputs = await self.run_inference(
               prompt, aiconfig, options, parameters
           )

           # Only add output messages to chat history on successful completion
           aiconfig.update_chat_history(self.get_output_messages(outputs))

           return outputs

The idea for now is that any ModelParser which extends ChatModelParser will implement their deserialize implementation to construct the completion params they need to pass to their underlying model API. Within deserialize, the ChatHistory messages can be serialized to the correct messages format for completion. Then, these completion_params are passed through to run_inference so that deserialization and chat history are already taken into account and the run_inference implementation mainly needs to concern itself with calling the API.

Update each of the chat-based model parsers to extend this new parser implementation and implement the deserialization function to construct the messages for their completion.

gemini.py
llama.py
openai_vision.py
openai.py -palm.py

Notably, when testing, a config with a single prompt should produce different outputs based on provided ChatHistory and the ChatHistory should not be serialized to the config.

Note on ChatCompletion wrapper – this can be removed since it inherently contradicts the intention of keeping chat history out of the serialized config.

AIConfig Local Editor Server (Includes VS Code)

Add chat_history to ServerState in server_utils.py and pass it through to the aiconfig.run call in the run function for /api/run route.

lastmile-ai / aiconfig