Support for evaluation of se-agent using SWE-bench

pdhoolia / se-agent

Software Engineering Agent

GNU General Public License v3.0

1 stars 1 forks source link

SWE-bench provides datasets to evaluate software engineering agents.

Each evaluation task instance (mapping to our concept of issue) provides:

repo: mapping to our repo_full_name
instance_id: mapping to our issue
problem_statement: mapping to our issue body
patch: mapping to the combination of our localization & change suggestions

Following challenges need to be addressed to evaluate our agent using the SWE-bench dataset:

Our agent needs to be added as a collaborator on the repo. However, SWE-bench is for offline evaluation. To address this we should be able to construct a Project with a pre-cloned repository as well.
1. Project should not keep Github as a member (requiring authenticating with GitHub at the time of construction)
2. Let's add another function in Project that returns an authenticated Github object.
3. Functions (e.g., clone_repository, pull_latest_changes, post_issue_comment, and fetch_issue_comments) that interact with the Github repo may call that new function to get an authenticated Github object.
4. With this change, Project object may be constructed in an offline fashion for externally cloned repository.
Agent operates on the latest snapshot of the repository. However, for evaluation purposes the requirement is different. we need the agent to operate on a specific commit hash.
1. Add a function to Project that allows reseting the repo state to a specified commit hash or by default: the HEAD
2. Semantic understanding update method in the Project starts by pulling the latest changes in the repo. This is good for live agent but not for evaluation. To facilitate evaluation, let's add a new method to vector_store_utils: create_vector_store(source_dir: str, uri: str, embeddings: Embeddings, commit_hash: str=None) -> VectorStore. This should create a vector_store at the specified uri, by embedding all the file contents in the source_dir, it should use the relative file path as ids, as well add it as metadata.filepath. It should also keep track of uri(s) it created where commit_hash was available, and if for a commit has vector store was previously created as a uri, it should just load and return it (instead of fresh creating the vector store)
Core processor doesn't have a method to drive evaluation for a task instance.
1. Let's add a new method to listener_core.py: evaluate(repo_full_name: str, instance_id: str, problem_statement: str, commit_hash: str). The method should:
  1. create a project instance.
  2. reset to project repo to commit_hash
  3. create a vector store with source_folder for the commit_hash using the new method we introduce above in vector_store_utils.
  4. Create a semantic_vector_search localizer with this vector_store
  5. localize
  6. suggest changes
  7. dump change suggestions to a file
  8. reset project repo back to HEAD
2. For storing vector stores and change suggestions towards evaluation,
  1. let's create a folder in project metadata named evaluation
  2. For each evaluation instance_id, we create a folder in that named <instance_id>
  3. vector_store for the instance_id, and for a specific source type should be named <vector-type>_vector_store.db, e.g., for source code embeddings that should be code_vector_store.db
  4. change suggestions should be dumped in a file named change_suggestions.md in the <instance_id> folder

To address the challenges outlined for evaluating the se-agent using the SWE-bench dataset, the following changes and additions to the existing codebase should be implemented:

1. Modify `Project` class to support offline evaluation

Changes in se_agent/project.py:

Remove Github as a member:
- Remove the self.github attribute from the Project class.

Add a method to get an authenticated Github object:

def get_authenticated_github(self) -> Github:
  """Returns an authenticated GitHub object."""
  if self.info.api_url:
      return Github(base_url=f"{self.info.api_url}", login_or_token=self.github_token)
  else:
      return Github(auth=Auth.Token(self.github_token))

Update functions to use the new Github method:
- In functions like clone_repository, pull_latest_changes, post_issue_comment, and fetch_issue_comments, replace self.github with self.get_authenticated_github() to obtain the Github object when needed.

2. Add functionality to reset repository state and handle vector stores

Changes in se_agent/project.py:

Add method to reset repo to a specific commit:

def reset_to_commit(self, commit_hash: str = "HEAD"):
  """Resets the repository to the specified commit hash."""
  try:
      repo = git.Repo(self.repo_folder)
      repo.git.reset('--hard', commit_hash)
      logger.info(f"Repository reset to commit: {commit_hash}")
  except Exception as e:
      logger.error(f"Error resetting to commit {commit_hash}: {e}")
      raise

Changes in se_agent/vector_store_utils.py:

Add function to create vector store from a specific commit:

from langchain_core.embeddings import Embeddings
from langchain_core.vectorstores import VectorStore, create_vector_store

vector_store_cache = {}

def create_vector_store(source_dir: str, uri: str, embeddings: Embeddings, commit_hash: str = None) -> VectorStore:
  """Creates a vector store for the specified source directory and commit hash."""
  cache_key = (uri, commit_hash)
  if cache_key in vector_store_cache:
      return vector_store_cache[cache_key]

  # Create and populate the vector store
  vector_store = create_vector_store(uri, embeddings)
  for root, _, files in os.walk(source_dir):
      for file in files:
          if file.endswith('.py'):
              file_path = os.path.join(root, file)
              relative_file_path = os.path.relpath(file_path, source_dir)
              with open(file_path, 'r') as f:
                  content = f.read()
              vector_store.add_documents(
                  documents=[Document(page_content=content, metadata={"filepath": relative_file_path})],
                  ids=[relative_file_path]
              )

  # Cache the vector store
  vector_store_cache[cache_key] = vector_store
  return vector_store

3. Add evaluation method to `listener_core`

Changes in se_agent/listener_core.py:

Add evaluate method:

from se_agent.localize.semantic_vector_search import SemanticVectorSearchLocalizer
from se_agent.change_suggester import suggest_changes

def evaluate(repo_full_name: str, instance_id: str, problem_statement: str, commit_hash: str):
  """Evaluates the agent for a given task instance."""
  projects_store = os.getenv('PROJECTS_STORE')
  github_token = os.getenv('GITHUB_TOKEN')

  # Create ProjectInfo and Project instance
  project_info = ProjectInfo(repo_full_name=repo_full_name)
  project = Project(github_token, projects_store, project_info)

  # Reset repo to the specified commit hash
  project.reset_to_commit(commit_hash)

  # Prepare evaluation directory
  evaluation_folder = os.path.join(project.metadata_folder, 'evaluation', instance_id)
  os.makedirs(evaluation_folder, exist_ok=True)

  # Create a vector store for the specific commit
  vector_store_uri = os.path.join(evaluation_folder, 'code_vector_store.db')
  embeddings = fetch_llm_for_task(TaskName.EMBEDDING)
  vector_store = create_vector_store(project.module_src_folder, vector_store_uri, embeddings, commit_hash)

  # Create a SemanticVectorSearchLocalizer with this vector store
  localizer = SemanticVectorSearchLocalizer(vector_store)

  # Localize and suggest changes
  analysis_results = {
      'title': problem_statement,
      'description': '',
      'conversation': []
  }
  filepaths = localizer.localize(issue=analysis_results, top_n=TOP_N)
  change_suggestions = suggest_changes(project, analysis_results, filepaths)

  # Dump change suggestions to a file
  change_suggestions_path = os.path.join(evaluation_folder, 'change_suggestions.md')
  with open(change_suggestions_path, 'w') as f:
      f.write(change_suggestions)

  # Reset the repo back to HEAD
  project.reset_to_commit()

These changes will allow the se-agent to be evaluated using the SWE-bench dataset in an offline manner, operating on specific commit snapshots, and storing both vector stores and change suggestions for each evaluation instance.

def evaluate(repo_full_name: str, instance_id: str, problem_statement: str, commit_hash: str): """ Evaluates the se-agent for a given task instance using the SWE-bench dataset. This method sets up a project from a pre-cloned repository, resets it to a specific commit, and then performs localization and change suggestion tasks. It stores the results in a designated evaluation directory. Args: repo_full_name (str): The full name of the repository (e.g., "owner/repo"). instance_id (str): The unique identifier for the evaluation task instance. problem_statement (str): The problem statement or issue body to be evaluated. commit_hash (str): The commit hash to which the repository should be reset. Steps: 1. Initialize a Project instance with the provided repository information. 2. Reset the repository to the specified commit hash. 3. Create an evaluation directory under the project's metadata folder. 4. Create a vector store for the code at the specified commit. 5. Use SemanticVectorSearchLocalizer to localize relevant files. 6. Generate change suggestions based on the localization results. 7. Dump the change suggestions to a markdown file in the evaluation directory. 8. Reset the repository back to the HEAD of the main branch. Raises: Exception: Propagates any exceptions encountered during the evaluation process. """ projects_store = os.getenv('PROJECTS_STORE') github_token = os.getenv('GITHUB_TOKEN') # Create ProjectInfo and Project instance project_info = ProjectInfo(repo_full_name=repo_full_name) project = Project(github_token, projects_store, project_info) # Reset repo to the specified commit hash project.reset_to_commit(commit_hash) # Prepare evaluation directory evaluation_folder = os.path.join(project.metadata_folder, 'evaluation', instance_id) os.makedirs(evaluation_folder, exist_ok=True) # Create a vector store for the specific commit vector_store_uri = os.path.join(evaluation_folder, 'code_vector_store.db') embeddings = fetch_llm_for_task(TaskName.EMBEDDING) vector_store = create_vector_store(project.module_src_folder, vector_store_uri, embeddings, commit_hash) # Create a SemanticVectorSearchLocalizer with this vector store localizer = SemanticVectorSearchLocalizer(vector_store) # Localize and suggest changes analysis_results = { 'title': problem_statement, 'description': '', 'conversation': [] } filepaths = localizer.localize(issue=analysis_results, top_n=TOP_N) change_suggestions = suggest_changes(project, analysis_results, filepaths) # Dump change suggestions to a file change_suggestions_path = os.path.join(evaluation_folder, 'change_suggestions.md') with open(change_suggestions_path, 'w') as f: f.write(change_suggestions) # Reset the repo back to HEAD project.reset_to_commit()

pdhoolia / se-agent