Open pdhoolia opened 1 day ago
To improve the semantic vector search-based localization strategy by incorporating the issue conversation, we can make changes to the SemanticVectorSearchLocalizer
class in the se_agent/localize/semantic_vector_search.py
file. Here's a step-by-step plan for the improvements:
Incorporate Conversation into Query:
localize
method to include comments and conversation in addition to the issue's title and description when constructing the search query.Update the Query Construction:
Limit the Conversation Length:
Testing and Validation:
Here's how you could implement these changes in se_agent/localize/semantic_vector_search.py
:
def localize(self, issue: Dict[str, str], top_n: int) -> List[str]:
"""Localizes an issue to the most relevant code files.
Uses the issue's title, description, and conversation to perform a similarity search in the vector store.
Args:
issue (Dict[str, str]): A dictionary containing the issue's title, description, and conversation.
top_n (int): The maximum number of relevant files to return.
Returns:
List[str]: A list of file paths corresponding to the most relevant code files.
"""
# Construct the query by combining the issue title, description, and conversation
conversation_text = " ".join([message['content'] for message in issue.get('conversation', [])])
query = f"{issue['title']}: {issue['description']} {conversation_text}"
# Perform a similarity search in the vector store
results = self.vector_store.similarity_search(query, k=top_n)
# Extract and return the file paths from the search results
return [result.metadata['filepath'] for result in results]
Implementing these changes should help improve the localization quality by leveraging the complete context available in the issue conversation.
ATM semantic vector search based localization strategy uses only the issue title, and description as it formulates the query for semantic vector search. It ignores the subsequent comments in the issue from the process of formulating the query. This may impact the localization quality as the comments may have crucial user input or corrections useful for localization.
Let's discuss and implement improvements for this.