Open pdhoolia opened 2 days ago
To implement the feature of creating and storing domain/project memories based on corrective comments, we need to make changes across several parts of the codebase. Here are the suggested changes:
File: se_agent/issue_analyzer.py
Add a function to determine if a comment is a domain understanding correction:
def is_domain_correction(comment_body: str) -> bool:
# Logic to determine if the comment is a domain correction
# This might involve checking for specific keywords or using a simple NLP model
# For simplicity, let's assume we have a keyword list for now
correction_keywords = ["incorrect", "not possible", "wrong suggestion", "doesn't work"]
return any(keyword in comment_body.lower() for keyword in correction_keywords)
File: se_agent/listener_core.py
In process_issue_event
, update the logic to handle domain corrections:
def process_issue_event(project: Project, issue_details, comment_details=None):
if comment_details:
comment_body = comment_details.get('body', '')
# Check if the comment is a domain understanding correction
if is_domain_correction(comment_body):
# Generate memory
conversation_context = analyze_issue(project, issue_details)
generate_and_store_memory(project, issue_details, comment_details, conversation_context)
return IGNORE_TOKEN
Add a function to generate and store memories:
from datetime import datetime
import uuid
import json
def generate_and_store_memory(project, issue_details, comment_details, conversation_context):
memory_texts = call_llm_for_memory(conversation_context)
memory_file_path = os.path.join(project.metadata_folder, 'memories.json')
memories = []
if os.path.exists(memory_file_path):
with open(memory_file_path, 'r') as file:
memories = json.load(file)
for memory_text in memory_texts:
memory = {
"memory_id": str(uuid.uuid4()),
"issue_id": issue_details['number'],
"comment_id": comment_details['id'],
"memory": memory_text,
"user_id": comment_details['user']['login'],
"created_at": datetime.utcnow().isoformat(),
"updated_at": datetime.utcnow().isoformat(),
"status": "new"
}
memories.append(memory)
with open(memory_file_path, 'w') as file:
json.dump(memories, file, indent=4)
File: se_agent/llm/api.py
Add a new LLM task for generating memories:
def call_llm_for_memory(conversation_context):
task_name = TaskName.GENERATE_MEMORY
messages = transform_to_langchain_base_chat_model_format(conversation_context)
response = call_llm_for_task(task_name, messages)
# Assuming the LLM response is a list of memory texts
return response
Update the TaskName
enum in model_configuration_manager.py
:
class TaskName(Enum):
# Existing tasks...
GENERATE_MEMORY = "generate_memory"
When processing new issues or comments, retrieve stored memories and incorporate them into the localization and suggestion prompts.
File: se_agent/project.py
Add a method to fetch memories:
def fetch_memories(self):
memory_file_path = os.path.join(self.metadata_folder, 'memories.json')
if not os.path.exists(memory_file_path):
return []
with open(memory_file_path, 'r') as file:
memories = json.load(file)
return memories
Use these memories when constructing prompts for LLM tasks related to issue localization and suggestion generation.
These changes enable the agent to detect domain corrections, generate succinct memories using an LLM, store them in a JSON file, and retrieve them for future reference. This approach ensures that the agent learns from corrective feedback and improves its domain understanding over time.
Lack of domain understanding may sometime lead to suggestions that are not feasible or practical. For example, suggesting a change in an emulator (that is supposed to exactly replicate quantum hardware) may not be the right suggestion to fix a bug in qiskit. Or something that is not possible in the current version of the programming language, or suggesting a change that is not in line with the project's coding standards.
When agent makes such proposals, other developers may point out the issues with the suggestion via comments. The agent should be able to understand these comments and their context and learn from them. It should be able to create a succinct, condensed memory of this understanding and store it for future reference.
Here's a high-level plan to implement this feature:
Before starting to process an issue comment, the agent may analyze the comment to check if it represents a domain understanding correction. This may be done by checking if the comment may be semantically categorized as a domain characteristic. Note: The agent probably uses the term
project
(for domain).If comment is identified as a domain understanding correction, the agent should add new functionality to generate a condensed memory of this understanding.
memory_id
: An id (generated) for the memoryissue_id
: The issue id in the context of which this memory was generatedcomment_id
: The comment id based on which this memory was generatedmemory
: The memory textuser_id
: The user id who made the commentcreated_at
: The timestamp when this memory was createdupdated_at
: The timestamp when this memory was last updatedstatus
: At the time of creation, the status of the memory should benew
. This status may be updated later based on curation by an expert. Expert may mark it asaccepted
,rejected
,needs_review
etc.The agent should retrieve these memories when processing new issues or comments. In particular, it should use them in the prompts for localization and suggestions.