pdhoolia / se-agent

Software Engineering Agent
GNU General Public License v3.0
1 stars 0 forks source link

domain / project memories #10

Open pdhoolia opened 2 days ago

pdhoolia commented 2 days ago

Lack of domain understanding may sometime lead to suggestions that are not feasible or practical. For example, suggesting a change in an emulator (that is supposed to exactly replicate quantum hardware) may not be the right suggestion to fix a bug in qiskit. Or something that is not possible in the current version of the programming language, or suggesting a change that is not in line with the project's coding standards.

When agent makes such proposals, other developers may point out the issues with the suggestion via comments. The agent should be able to understand these comments and their context and learn from them. It should be able to create a succinct, condensed memory of this understanding and store it for future reference.

Here's a high-level plan to implement this feature:

se-agent commented 2 days ago

To implement the feature of creating and storing domain/project memories based on corrective comments, we need to make changes across several parts of the codebase. Here are the suggested changes:

Step 1: Detect Domain Understanding Corrections

File: se_agent/issue_analyzer.py

Add a function to determine if a comment is a domain understanding correction:

def is_domain_correction(comment_body: str) -> bool:
    # Logic to determine if the comment is a domain correction
    # This might involve checking for specific keywords or using a simple NLP model
    # For simplicity, let's assume we have a keyword list for now
    correction_keywords = ["incorrect", "not possible", "wrong suggestion", "doesn't work"]
    return any(keyword in comment_body.lower() for keyword in correction_keywords)

Step 2: Generate and Store Memories

File: se_agent/listener_core.py

In process_issue_event, update the logic to handle domain corrections:

def process_issue_event(project: Project, issue_details, comment_details=None):
    if comment_details:
        comment_body = comment_details.get('body', '')

        # Check if the comment is a domain understanding correction
        if is_domain_correction(comment_body):
            # Generate memory
            conversation_context = analyze_issue(project, issue_details)
            generate_and_store_memory(project, issue_details, comment_details, conversation_context)
            return IGNORE_TOKEN

Add a function to generate and store memories:

from datetime import datetime
import uuid
import json

def generate_and_store_memory(project, issue_details, comment_details, conversation_context):
    memory_texts = call_llm_for_memory(conversation_context)

    memory_file_path = os.path.join(project.metadata_folder, 'memories.json')
    memories = []
    if os.path.exists(memory_file_path):
        with open(memory_file_path, 'r') as file:
            memories = json.load(file)

    for memory_text in memory_texts:
        memory = {
            "memory_id": str(uuid.uuid4()),
            "issue_id": issue_details['number'],
            "comment_id": comment_details['id'],
            "memory": memory_text,
            "user_id": comment_details['user']['login'],
            "created_at": datetime.utcnow().isoformat(),
            "updated_at": datetime.utcnow().isoformat(),
            "status": "new"
        }
        memories.append(memory)

    with open(memory_file_path, 'w') as file:
        json.dump(memories, file, indent=4)

Step 3: Call LLM for Memory Generation

File: se_agent/llm/api.py

Add a new LLM task for generating memories:

def call_llm_for_memory(conversation_context):
    task_name = TaskName.GENERATE_MEMORY
    messages = transform_to_langchain_base_chat_model_format(conversation_context)
    response = call_llm_for_task(task_name, messages)
    # Assuming the LLM response is a list of memory texts
    return response

Update the TaskName enum in model_configuration_manager.py:

class TaskName(Enum):
    # Existing tasks...
    GENERATE_MEMORY = "generate_memory"

Step 4: Retrieve Memories

When processing new issues or comments, retrieve stored memories and incorporate them into the localization and suggestion prompts.

File: se_agent/project.py

Add a method to fetch memories:

def fetch_memories(self):
    memory_file_path = os.path.join(self.metadata_folder, 'memories.json')
    if not os.path.exists(memory_file_path):
        return []

    with open(memory_file_path, 'r') as file:
        memories = json.load(file)
    return memories

Use these memories when constructing prompts for LLM tasks related to issue localization and suggestion generation.

Summary

These changes enable the agent to detect domain corrections, generate succinct memories using an LLM, store them in a JSON file, and retrieve them for future reference. This approach ensures that the agent learns from corrective feedback and improves its domain understanding over time.