pdhoolia / se-agent

Software Engineering Agent
GNU General Public License v3.0
1 stars 1 forks source link

Use issue conversation during semantic vector search #36

Open pdhoolia opened 1 day ago

pdhoolia commented 1 day ago

ATM semantic vector search based localization strategy uses only the issue title, and description as it formulates the query for semantic vector search. It ignores the subsequent comments in the issue from the process of formulating the query. This may impact the localization quality as the comments may have crucial user input or corrections useful for localization.

Let's discuss and implement improvements for this.

se-agent commented 1 day ago

To improve the semantic vector search-based localization strategy by incorporating the issue conversation, we can make changes to the SemanticVectorSearchLocalizer class in the se_agent/localize/semantic_vector_search.py file. Here's a step-by-step plan for the improvements:

  1. Incorporate Conversation into Query:

    • Modify the localize method to include comments and conversation in addition to the issue's title and description when constructing the search query.
  2. Update the Query Construction:

    • Concatenate the issue's title, description, and relevant conversation messages to form a comprehensive query string for the semantic vector search.
  3. Limit the Conversation Length:

    • To ensure the query isn't excessively long, you might want to limit the number of comments considered or truncate the text to a reasonable length.
  4. Testing and Validation:

    • After making the changes, test the modified localization strategy to validate the improvements in localization accuracy.

Here's how you could implement these changes in se_agent/localize/semantic_vector_search.py:

def localize(self, issue: Dict[str, str], top_n: int) -> List[str]:
    """Localizes an issue to the most relevant code files.

    Uses the issue's title, description, and conversation to perform a similarity search in the vector store.

    Args:
        issue (Dict[str, str]): A dictionary containing the issue's title, description, and conversation.
        top_n (int): The maximum number of relevant files to return.

    Returns:
        List[str]: A list of file paths corresponding to the most relevant code files.
    """
    # Construct the query by combining the issue title, description, and conversation
    conversation_text = " ".join([message['content'] for message in issue.get('conversation', [])])
    query = f"{issue['title']}: {issue['description']} {conversation_text}"

    # Perform a similarity search in the vector store
    results = self.vector_store.similarity_search(query, k=top_n)

    # Extract and return the file paths from the search results
    return [result.metadata['filepath'] for result in results]

Key Considerations:

Implementing these changes should help improve the localization quality by leveraging the complete context available in the issue conversation.