sourcegraph / cody

Type less, code more: Cody is an AI code assistant that uses advanced search and codebase context to help you write and fix code.
https://cody.dev
Apache License 2.0
2.59k stars 277 forks source link

Suggestion: Multi-Agent System for Optimized Code Generation #2639

Closed monko9j1 closed 6 months ago

monko9j1 commented 9 months ago

I present an enhanced proposal for a comprehensive multi-agent system. This system uniquely leverages the OpenAI API to optimize task management in code generation, specifically addressing the max token limit challenge in extensive requests. It includes four main components. Note that the 'agents' here are not OpenAI’s 'Assistants API' or GPTS, but custom-built, programmatic components designed for the specific logic and processes of this system.

Sub-Task Creation Agent: This agent dissects complex coding queries into smaller, manageable sub-tasks. It processes user requests, converting them into a structured JSON format. This not only details each sub-task but also outlines the necessary steps for execution, thereby setting a solid foundation for efficient and targeted code generation.

Code Generation Agent: Operating on the groundwork laid by the Sub-Task Creation Agent, this component is responsible for translating each sub-task into corresponding code segments. This ensures that the technical solutions provided are in perfect alignment with their respective sub-tasks, resulting in relevant, functional, and operational code.

Code Completion Agent: This agent is crucial in scenarios where the output of the Code Generation Agent is incomplete due to the max token limit. Its primary role is to seamlessly continue the code generation process, completing any partially produced code. This guarantees that the end product delivered to the user is comprehensive and fully functional.

Overseer Agent with OpenAI API "Function Calling" Capability: The 'Overseer Agent' is the cornerstone of this system. It is enhanced with the capability to utilize OpenAI's API "function calling" feature. This agent is tasked with orchestrating the entire workflow of the system. It dynamically assesses user requests and, utilizing the "function calling" feature, determines the optimal sequence and involvement of the other agents. This ensures efficient information flow and coordination between agents, particularly vital when the Code Completion Agent needs to be engaged. The incorporation of the "function calling" feature allows the Overseer Agent to execute complex decision-making processes and manage tasks more effectively, thereby significantly elevating the system's overall performance and reliability in delivering complete coding solutions.

Sub-Task Creation Agent: I have created a example system prompt which results to Token count of 231 using Tiktokenizer for the (gpt-4-1106-preview)

System prompt example: "Break down coding questions into smaller sub-tasks for code generation, presenting the steps for each sub-task in a JSON format. Structure the response as follows: Begin with an overview of the user's ultimate goal under the 'context' key in the JSON object. Then, enumerate each sub-task using keys like 'Sub-Task 1', 'Sub-Task 2', etc., with each sub-task being a separate JSON object within the main JSON structure. For each sub-task, use the key 'Task' to describe the task. Follow this with a 'Steps' key, containing numbered steps as nested JSON objects, e.g., '1', '2', etc., detailing the steps for the sub-task. Exclude sub-tasks like 'testing' or 'Creating the project' as they do not involve code and will be omitted. Ensure the response is strictly in JSON format, without unnecessary commentary or text outside the JSON structure. Aim for a consistent structure where each 'Sub-Task' object contains 'Task' and 'Steps' keys, with 'Steps' being further broken down into numbered sub-keys."

User prompt example: Utilizing the above system prompt in the openai-playground, i gave it the following user prompt (gpt-4-1106-preview): "I want to use native API's in my .net maui blazor application. I would like to access native API's in order to use the platform specific API's to record audio from the device." It gave me the following JSON response; I’ve minifying it so it’s easier to read, I’d also recommend minifying it in production to save on token’s usage:

{"context":"Accessing and using native platform-specific APIs to record audio in a .NET MAUI Blazor application.","Sub-Task 1":{"Task":"Setting up dependency services","Steps":{"1":"Define an interface in the shared code to describe the functionality of the audio recorder.","2":"Implement this interface in each platform-specific project (iOS, Android, etc.).","3":"Use the DependencyService in Xamarin.Forms to register the platform implementation with the interface."}},"Sub-Task 2":{"Task":"Implement the interface for Android","Steps":{"1":"Create a new class in the Android project that inherits the interface defined in the shared code.","2":"Implement the interface members using the Android.Media APIs to start and stop audio recording.","3":"Register the implementation with the DependencyService using the [assembly: Dependency(typeof(AudioRecorderImplementation))] attribute above the namespace."}},"Sub-Task 3":{"Task":"Implement the interface for iOS","Steps":{"1":"Create a new class in the iOS project that inherits the interface defined in the shared code.","2":"Implement the interface members using the AVFoundation framework to manage audio recording.","3":"Register the implementation with the DependencyService using the [assembly: Dependency(typeof(AudioRecorderImplementation))] attribute above the namespace."}},"Sub-Task 4":{"Task":"Using the recorder in the shared code","Steps":{"1":"Access the platform-specific audio recorder implementation from the shared code by calling DependencyService.Get().","2":"Use the methods defined in the IAudioRecorderInterface to start, stop, and manage audio recording."}},"Sub-Task 5":{"Task":"Handling permissions","Steps":{"1":"Request recording and microphone permissions on the platform-specific projects as required.","2":"Handle the permissions response in the shared code before starting the audio recording."}}}

Code Generation Agent: Sense the output created by the Sub-Task creation agent is in a JSON Structure, it will be simple to work with programmatically. I see two ways to go about this.

  1. Simply send the full output, and request the model to generate code for a certain Sub-Task Only.
  2. We can split the output into "context" followed by each "Sub-Task #" individually. For testing purposes, I decided to go with 1. Now for the Code generation agent, I crafted this system prompt: ” Your role as the Code Generation Agent is to generate code for a specific sub-task specified by a user, and present the output in a structured JSON format. For each request, you will focus on one sub-task at a time. After carefully reviewing the details of the chosen sub-task, including its context and steps, generate the corresponding code. Present your output in the following JSON structure:
  3. A top-level JSON object encapsulating the entire response.
  4. A key 'Sub-Task' indicating the specific sub-task number for which the code is generated.
  5. A key 'Task Description' providing a brief overview of the sub-task.
  6. A key 'Code' containing the generated code. If the code consists of multiple lines or segments, present it as an array of strings, each representing a line or segment of code.
  7. An optional key 'Additional Information' for any notes or comments about the generated code that might be helpful. Ensure that the code is complete, functional, relevant to the sub-task, and adheres to best coding practices. Structure your response to clearly indicate which sub-task the code corresponds to. Avoid generating code for unrequested sub-tasks or adding extraneous commentary outside of the JSON structure. Your focus is solely on providing operational and specific code for the selected sub-task, formatted as a JSON object for easy parsing and integration in a programmatic environment.”

Given the system prompt shown above, I took the Minified JSON we got from the Sub-Task creation agent and simply sent the following User-prompt: “{"context":"Accessing and using native platform-specific APIs to record audio in a .NET MAUI Blazor application.","Sub-Task 1":{"Task":"Setting up dependency services","Steps":{"1":"Define an interface in the shared code to describe the functionality of the audio recorder.","2":"Implement this interface in each platform-specific project (iOS, Android, etc.).","3":"Use the DependencyService in Xamarin.Forms to register the platform implementation with the interface."}},"Sub-Task 2":{"Task":"Implement the interface for Android","Steps":{"1":"Create a new class in the Android project that inherits the interface defined in the shared code.","2":"Implement the interface members using the Android.Media APIs to start and stop audio recording.","3":"Register the implementation with the DependencyService using the [assembly: Dependency(typeof(AudioRecorderImplementation))] attribute above the namespace."}},"Sub-Task 3":{"Task":"Implement the interface for iOS","Steps":{"1":"Create a new class in the iOS project that inherits the interface defined in the shared code.","2":"Implement the interface members using the AVFoundation framework to manage audio recording.","3":"Register the implementation with the DependencyService using the [assembly: Dependency(typeof(AudioRecorderImplementation))] attribute above the namespace."}},"Sub-Task 4":{"Task":"Using the recorder in the shared code","Steps":{"1":"Access the platform-specific audio recorder implementation from the shared code by calling DependencyService.Get().","2":"Use the methods defined in the IAudioRecorderInterface to start, stop, and manage audio recording."}},"Sub-Task 5":{"Task":"Handling permissions","Steps":{"1":"Request recording and microphone permissions on the platform-specific projects as required.","2":"Handle the permissions response in the shared code before starting the audio recording."}}} Provide code for Sub-Task 3.”

I asked it to provide code for a certain Sub-Task this was the result I’ve also minified it so that it can be viewable.

Result: {"Sub-Task":3,"Task Description":"Implement the interface for iOS using the AVFoundation framework to manage audio recording.","Code":["using AVFoundation;","using Foundation;","using YourNamespace.iOS;","using Xamarin.Forms;","","[assembly: Dependency(typeof(AudioRecorderImplementation))]","namespace YourNamespace.iOS","{"," public class AudioRecorderImplementation : IAudioRecorder"," {"," private AVAudioRecorder recorder;"," private NSDictionary settings;"," private NSError error;"," private string audioFilePath;",""," public AudioRecorderImplementation()"," {"," var audioSession = AVAudioSession.SharedInstance();"," audioSession.SetCategory(AVAudioSessionCategory.PlayAndRecord, out error);"," audioSession.SetActive(true, out error);"," "," // Set recording settings"," settings = new NSDictionary("," AVAudioSettings.AVFormatIDKey, NSNumber.FromInt32((int)AudioToolbox.AudioFormatType.LinearPCM),"," AVAudioSettings.AVSampleRateKey, NSNumber.FromFloat(44100f),"," AVAudioSettings.AVNumberOfChannelsKey, NSNumber.FromInt32(1),"," AVAudioSettings.AVLinearPCMBitDepthKey, NSNumber.FromInt32(16),"," AVAudioSettings.AVLinearPCMIsBigEndianKey, false,"," AVAudioSettings.AVLinearPCMIsFloatKey, false"," );"," "," // Prepare the recorder"," audioFilePath = Path.Combine(Path.GetTempPath(), \"audio_record.m4a\");"," var audioUrl = NSUrl.FromFilename(audioFilePath);"," recorder = AVAudioRecorder.Create(audioUrl, new AudioSettings(settings), out error);"," recorder.PrepareToRecord();"," }"," "," public void StartRecording()"," {"," if (!recorder.Recording)"," {"," recorder.Record();"," }"," }"," "," public void StopRecording()"," {"," if (recorder.Recording)"," {"," recorder.Stop();"," }"," }"," }","}"],"Additional Information":"This code provides a basic implementation for an audio recorder on iOS using AVFoundation. The namespace 'YourNamespace.iOS' should be replaced with your actual namespace. Additionally, you must ensure that the interface 'IAudioRecorder' is correctly defined in your shared project and that it includes both 'StartRecording' and 'StopRecording' methods. The 'audioFilePath' should point to a valid path for storing the audio file."}

Code Completion Agent: For this Agent, I crafted a prompt which returned promising results, the following system prompt was tested, resulting in 409 tokens via tiktokenizer: “As the Code Completion Agent, your role is to complete code segments that were cut off due to reaching the max token limit from the Code Generation Agent. You will receive input in a JSON format with two key components: 'context' and 'Cutoff Code'. Your task is to analyze the 'Cutoff Code', understand its structure, intended functionality, and context, and then generate the remaining portion of the code to complete it. When generating the completion, pay close attention to the following:

  1. Avoid Redundancy: Ensure that your completion does not reintroduce code or concepts already present in the 'Cutoff Code'. Review the provided code snippet thoroughly to avoid duplicating functionality or redeclaring variables and settings.
  2. Maintain Error Handling Consistency: If the 'Cutoff Code' uses specific error handling mechanisms or objects (e.g., a particular NSError object), continue using the same approach in your completion. Consistency in error handling is key for a coherent codebase. The output of your completion must follow the same JSON structure as provided by the Code Generation Agent. It should be presented as a JSON object with a single key 'Completed Code', containing an array of strings where each string is a continuation of the code lines starting from the cutoff point. The completion should seamlessly integrate with the provided cutoff code, maintaining consistency in coding style and functionality. Here’s the JSON structure for your output: { "Completed Code": [""] } In this structure, the 'Completed Code' key will contain an array of strings, each representing a line or segment of your completed code. It should start from the exact point where the cutoff occurred in the 'Cutoff Code'. Ensure that the completion is logical, syntactically correct, and functionally integrates with the cutoff code to form a complete and operational code segment. Your focus is to provide a seamless continuation that aligns with the existing code's style and functionality, while avoiding redundancy and maintaining error handling consistency.”

With the given system prompt, I ran a test. I took the generated output by the Code generation agent and simulated a cut off event happening for the code. The following is the user prompt created: { "context": "Implement the interface for iOS using the AVFoundation framework to manage audio recording", "cutoff": ""using AVFoundation;","using Foundation;","using YourNamespace.iOS;","using Xamarin.Forms;","","[assembly: Dependency(typeof(AudioRecorderImplementation))]","namespace YourNamespace.iOS","{"," public class AudioRecorderImplementation : IAudioRecorder"," {"," private AVAudioRecorder recorder;"," private NSDictionary settings;"," private NSError error;"," private string audioFilePath;",""," public AudioRecorderImplementation()"," {"," var audioSession = AVAudioSession.SharedInstance();"," audioSession.SetCategory(AVAudioSessionCategory.PlayAndRecord, out error);"," audioSession.SetActive(true, out error);"," "," // Set recording settings"," settings = new NSDictionary("," AVAudioSettings.AVFormatIDKey, NSNumber.FromInt32((int)AudioToolbox.AudioFormatType.LinearPCM),"," AVAudioSettings.AVSampleRateKey, NSNumber.FromFloat(44100f),"," AVAudioSettings.AVNumberOfChannelsKey, NSNumber.FromInt32(1),"," AVAudioSettings.AVLinearPCMBitDepthKey, NSNumber.FromInt32(16),"," AVAudioSettings.AVLinearPCMIsBigEndianKey, false,"," AVAudioSettings.AVLinearPCMIsFloatKey, false"," );"," "," // Prepare the recorder"," audioFilePath = Path.Combine(Path.GetTempPath(), \"audio_record.m4a\");"," var audioUrl = NSUrl.FromFilename(audioFilePath);"," recorder = AVAudioRecorder.Create(audioUrl, new AudioSettings(settings), out error);"," recorder.PrepareToRecord();"," }"," "," public void StartRecording()"," {"," if" }

In this example I sent it a Context which was the “Task Description” from the Code Generation Agent, as well as a cutoff, the cut-off was made at the start of the a “if” statement. With the given system prompt and user prompt I got the following output: {"Completed Code":["(!recorder.Recording)","{"," recorder.Record();","}","}","","public void StopRecording()","{"," if (recorder.Recording)"," {"," recorder.Stop();"," }","}","","public string GetRecordedFilePath()","{"," return audioFilePath;","}","","public bool IsRecording()","{"," return recorder.Recording;","}","","public void Dispose()","{"," if (recorder != null)"," {"," recorder.Dispose();"," recorder = null;"," }","}","}"]}

Result: The output from the Code Completion Agent appears to meet the requirements effectively. It provides a coherent and functionally relevant continuation of the cutoff code, adhering to the specified JSON structure and maintaining consistency with the existing code.

Overseer Agent with OpenAI API "Function Calling" Capability: The Role of the Overseer Agent

  1. First Point of Contact: The Overseer Agent acts as the initial interface for user requests via chat. It's responsible for assessing and routing these requests.

  2. Request Assessment:

    • Non-Code Queries: If a user's question doesn't require code generation, the Overseer Agent directly answers without involving other agents.
    • Simple Code Requests: For straightforward code needs (like fixing a single function), the Overseer Agent directly engages the Code Generation Agent. (This would require prompt adjustments for Coding agent) or possibly solve the request itself.
    • Complex Requests Needing Sub-Tasks: For multifaceted requests, like the example given: “using native APIs in a .NET MAUI Blazor application for audio recording”, the Overseer Agent identifies the need for sub-task creation.
  3. Workflow Management:

    • Uses OpenAI API function calls to manage the workflow between the Sub-Task Creation Agent, Code Generation Agent, and Code Completion Agent. Handling Complex Requests:
  4. Engaging Sub-Task Creation Agent:

    • For complex requests, the Overseer Agent calls the Sub-Task Creation Agent first.
    • Example JSON response from the Sub-Task Creation Agent:
      {
      "context": "...",
      "Sub-Task 1": {"Task": "...", "Steps": {"1": "...", "2": "...", "3": "..."}},
      ...
      }
  5. Processing Sub-Task Responses:

    • The Overseer Agent parses the JSON response to understand the number and nature of sub-tasks.
    • It sequentially executes each sub-task, making function calls to the Code Generation Agent.
  6. Checking Sub-Task Completion:

    • Analyzes the Code Generation Agent's responses to determine if a sub-task is complete.
    • Looks for the start (json`) and end ( ) markers in the response. (Open-ai often does 3 **** ) If missing, it implies incomplete sub-task.
  7. Handling Incomplete Code:

    • If a sub-task is incomplete by the Code Generation Agent, (e.g., cut-off occurs between "Code": [ and "Additional Information":), the Overseer Agent parses the current output.
    • It then sends this parsed code to the Code Completion Agent for finishing.
  8. Finalizing and Sending Response to User:

    • Continues this process until all sub-tasks are completed.
    • Can stream the completed responses (code and explanations) to the user in an organized way.

Key Operational Aspects of the Overseer Agent:

Enhancing the System with Fine-tuning: Fine-tuning an OpenAI API model is a logical step for this system. I suggest implementing a closed beta phase, during which you can develop fine-tuning datasets. In this phase, users would have the option to contribute their training data. It's crucial to systematically record both the corresponding user prompts and model responses. The advantage of fine-tuning lies in its ability to teach the model specific patterns. This approach can lead to more efficient use of system prompts in each interaction, thereby reducing both token usage and overall costs. If this is not viable for any reason, synthetic dataset's can be created.

Conclusion: The proposed Multi-Agent System for Optimized Code Generation has the potential to significantly enhance user experience in coding tasks. When fine-tuned and implemented correctly, it promises to surpass the capabilities of existing coding assistants by delivering more precise responses. There's room for further improvements, such as introducing additional agents, combining agents or improving system prompts and JSON outputs.

A key advantage of this system is the efficient use of the gpt-4-1106-preview model's maximum token output, achieved by breaking down user requests into sub-tasks. This ensures that each sub-task can harness the full potential of the model's token output, providing thorough and detailed responses that align more closely with the user's requirements. In contrast to current systems that might rely on a “single agent” approach, often leading to incomplete or boilerplate code towards the token limit, the multi-agent approach systematically addresses complex queries with precision. Each agent plays a crucial role in fulfilling specific aspects of the user's request, ensuring a more comprehensive and satisfactory solution overall. This system is designed to eliminate the shortcomings of current models, focusing on maximizing output quality for each aspect of a coding query. Additionally, exploring a multi-model approach, such as using gpt-3.5 turbo or other models for creating Sub-Tasks and reserving more powerful GPT models for code generation. This not only promises cost efficiency but also leverages the strengths of different models in a cohesive system.

Moreover, inspired by Stanford University's recent research, "FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance" (https://arxiv.org/pdf/2305.05176.pdf), considerations to integrate similar methodologies can be made. These strategies could significantly contribute to developing a more advanced and economical coding assistant. Initiating a dialogue around these proposed methods and their integration could greatly benefit the Cody team, paving the way for a superior coding assistant tool that effectively balances performance and cost.

Thanks,

Dobo_J

github-actions[bot] commented 6 months ago

This issue is marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed automatically in 5 days.