Open pdhoolia opened 3 days ago
To address the challenges outlined for evaluating the se-agent
using the SWE-bench dataset, the following changes and additions to the existing codebase should be implemented:
Project
class to support offline evaluationChanges in se_agent/project.py
:
Remove Github
as a member:
self.github
attribute from the Project
class.Add a method to get an authenticated Github
object:
def get_authenticated_github(self) -> Github:
"""Returns an authenticated GitHub object."""
if self.info.api_url:
return Github(base_url=f"{self.info.api_url}", login_or_token=self.github_token)
else:
return Github(auth=Auth.Token(self.github_token))
Update functions to use the new Github
method:
clone_repository
, pull_latest_changes
, post_issue_comment
, and fetch_issue_comments
, replace self.github
with self.get_authenticated_github()
to obtain the Github
object when needed.Changes in se_agent/project.py
:
def reset_to_commit(self, commit_hash: str = "HEAD"):
"""Resets the repository to the specified commit hash."""
try:
repo = git.Repo(self.repo_folder)
repo.git.reset('--hard', commit_hash)
logger.info(f"Repository reset to commit: {commit_hash}")
except Exception as e:
logger.error(f"Error resetting to commit {commit_hash}: {e}")
raise
Changes in se_agent/vector_store_utils.py
:
Add function to create vector store from a specific commit:
from langchain_core.embeddings import Embeddings
from langchain_core.vectorstores import VectorStore, create_vector_store
vector_store_cache = {}
def create_vector_store(source_dir: str, uri: str, embeddings: Embeddings, commit_hash: str = None) -> VectorStore:
"""Creates a vector store for the specified source directory and commit hash."""
cache_key = (uri, commit_hash)
if cache_key in vector_store_cache:
return vector_store_cache[cache_key]
# Create and populate the vector store
vector_store = create_vector_store(uri, embeddings)
for root, _, files in os.walk(source_dir):
for file in files:
if file.endswith('.py'):
file_path = os.path.join(root, file)
relative_file_path = os.path.relpath(file_path, source_dir)
with open(file_path, 'r') as f:
content = f.read()
vector_store.add_documents(
documents=[Document(page_content=content, metadata={"filepath": relative_file_path})],
ids=[relative_file_path]
)
# Cache the vector store
vector_store_cache[cache_key] = vector_store
return vector_store
listener_core
Changes in se_agent/listener_core.py
:
Add evaluate
method:
from se_agent.localize.semantic_vector_search import SemanticVectorSearchLocalizer
from se_agent.change_suggester import suggest_changes
def evaluate(repo_full_name: str, instance_id: str, problem_statement: str, commit_hash: str):
"""Evaluates the agent for a given task instance."""
projects_store = os.getenv('PROJECTS_STORE')
github_token = os.getenv('GITHUB_TOKEN')
# Create ProjectInfo and Project instance
project_info = ProjectInfo(repo_full_name=repo_full_name)
project = Project(github_token, projects_store, project_info)
# Reset repo to the specified commit hash
project.reset_to_commit(commit_hash)
# Prepare evaluation directory
evaluation_folder = os.path.join(project.metadata_folder, 'evaluation', instance_id)
os.makedirs(evaluation_folder, exist_ok=True)
# Create a vector store for the specific commit
vector_store_uri = os.path.join(evaluation_folder, 'code_vector_store.db')
embeddings = fetch_llm_for_task(TaskName.EMBEDDING)
vector_store = create_vector_store(project.module_src_folder, vector_store_uri, embeddings, commit_hash)
# Create a SemanticVectorSearchLocalizer with this vector store
localizer = SemanticVectorSearchLocalizer(vector_store)
# Localize and suggest changes
analysis_results = {
'title': problem_statement,
'description': '',
'conversation': []
}
filepaths = localizer.localize(issue=analysis_results, top_n=TOP_N)
change_suggestions = suggest_changes(project, analysis_results, filepaths)
# Dump change suggestions to a file
change_suggestions_path = os.path.join(evaluation_folder, 'change_suggestions.md')
with open(change_suggestions_path, 'w') as f:
f.write(change_suggestions)
# Reset the repo back to HEAD
project.reset_to_commit()
These changes will allow the se-agent
to be evaluated using the SWE-bench dataset in an offline manner, operating on specific commit snapshots, and storing both vector stores and change suggestions for each evaluation instance.
can you please document the evaluate
method in listener_core.py
using proper python doc comments
Certainly! Here's the evaluate
method with detailed Python doc comments:
def evaluate(repo_full_name: str, instance_id: str, problem_statement: str, commit_hash: str):
"""
Evaluates the se-agent for a given task instance using the SWE-bench dataset.
This method sets up a project from a pre-cloned repository, resets it to a specific
commit, and then performs localization and change suggestion tasks. It stores the
results in a designated evaluation directory.
Args:
repo_full_name (str): The full name of the repository (e.g., "owner/repo").
instance_id (str): The unique identifier for the evaluation task instance.
problem_statement (str): The problem statement or issue body to be evaluated.
commit_hash (str): The commit hash to which the repository should be reset.
Steps:
1. Initialize a Project instance with the provided repository information.
2. Reset the repository to the specified commit hash.
3. Create an evaluation directory under the project's metadata folder.
4. Create a vector store for the code at the specified commit.
5. Use SemanticVectorSearchLocalizer to localize relevant files.
6. Generate change suggestions based on the localization results.
7. Dump the change suggestions to a markdown file in the evaluation directory.
8. Reset the repository back to the HEAD of the main branch.
Raises:
Exception: Propagates any exceptions encountered during the evaluation process.
"""
projects_store = os.getenv('PROJECTS_STORE')
github_token = os.getenv('GITHUB_TOKEN')
# Create ProjectInfo and Project instance
project_info = ProjectInfo(repo_full_name=repo_full_name)
project = Project(github_token, projects_store, project_info)
# Reset repo to the specified commit hash
project.reset_to_commit(commit_hash)
# Prepare evaluation directory
evaluation_folder = os.path.join(project.metadata_folder, 'evaluation', instance_id)
os.makedirs(evaluation_folder, exist_ok=True)
# Create a vector store for the specific commit
vector_store_uri = os.path.join(evaluation_folder, 'code_vector_store.db')
embeddings = fetch_llm_for_task(TaskName.EMBEDDING)
vector_store = create_vector_store(project.module_src_folder, vector_store_uri, embeddings, commit_hash)
# Create a SemanticVectorSearchLocalizer with this vector store
localizer = SemanticVectorSearchLocalizer(vector_store)
# Localize and suggest changes
analysis_results = {
'title': problem_statement,
'description': '',
'conversation': []
}
filepaths = localizer.localize(issue=analysis_results, top_n=TOP_N)
change_suggestions = suggest_changes(project, analysis_results, filepaths)
# Dump change suggestions to a file
change_suggestions_path = os.path.join(evaluation_folder, 'change_suggestions.md')
with open(change_suggestions_path, 'w') as f:
f.write(change_suggestions)
# Reset the repo back to HEAD
project.reset_to_commit()
This documentation provides an overview of the method's purpose, describes the parameters, outlines the steps performed, and indicates potential exceptions that might be raised during execution.
SWE-bench provides datasets to evaluate software engineering agents.
Each evaluation task instance (mapping to our concept of issue) provides:
Following challenges need to be addressed to evaluate our agent using the SWE-bench dataset:
Our agent needs to be added as a collaborator on the repo. However, SWE-bench is for offline evaluation. To address this we should be able to construct a Project with a pre-cloned repository as well.
Github
as a member (requiring authenticating with GitHub at the time of construction)Github
object.Github
object.Agent operates on the latest snapshot of the repository. However, for evaluation purposes the requirement is different. we need the agent to operate on a specific commit hash.
vector_store_utils
:create_vector_store(source_dir: str, uri: str, embeddings: Embeddings, commit_hash: str=None) -> VectorStore
. This should create a vector_store at the specified uri, by embedding all the file contents in the source_dir, it should use the relative file path as ids, as well add it asmetadata.filepath
. It should also keep track of uri(s) it created where commit_hash was available, and if for a commit has vector store was previously created as a uri, it should just load and return it (instead of fresh creating the vector store)Core processor doesn't have a method to drive evaluation for a task instance.
listener_core.py
:evaluate(repo_full_name: str, instance_id: str, problem_statement: str, commit_hash: str)
. The method should:vector_store_utils
.evaluation
<instance_id>
<vector-type>_vector_store.db
, e.g., for source code embeddings that should becode_vector_store.db
change_suggestions.md
in the<instance_id>
folder