princeton-nlp / SWE-agent

[NeurIPS 2024] SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges.
https://swe-agent.com
MIT License
13.71k stars 1.39k forks source link

How does SWE-agent find the issue # or pull request # to fix the bug? #640

Closed ivan4722 closed 4 months ago

ivan4722 commented 4 months ago

Describe the issue

I'm currently trying to prompt the agent to generate dependency graphs, and the agent is able to execute the curl command accordingly. The command looks like this ./generate_graph.sh pytest 7490 pytest-dev and the middle number is supposed to be the PR # that is extracted from the instance id. However, when I run the agent, it runs like this:

ACTION (primary)
generate_graph pytest placeholder pytest-dev

or mentions that a PR # was not provided.

Is there anywhere I should look to see how the agent finds the PR # or issue # to be fixed? I'm a little confused because the PR # is explicitly stated in the dataset, and I assume that dataset is fed to the agent.

Thank you!

Optional: Relevant documentation page

No response

klieret commented 4 months ago

The number in the instance ID (e.g., "django__django-16379") is the number of the PR, so you could use that. But the agent of course has all of the information from the dataset that it loads (the --data_path file which loads SWE-bench light dev-set by default).

ivan4722 commented 4 months ago

The number in the instance ID (e.g., "django__django-16379") is the number of the PR, so you could use that. But the agent of course has all of the information from the dataset that it loads (the --data_path file which loads SWE-bench light dev-set by default).

Right, I was using that to get the PR # for my own experimentation. However, the agent seems confused and does not know where to get the PR number, is there a specific code file where it is broken down? I was just wondering how it was handled in the backend

ivan4722 commented 4 months ago

More so asking how to prompt the agent to use that number

ivan4722 commented 4 months ago

Currently I have

        instance_id = self.env.data[index]["instance_id"]
        # Extract and print the PR number
        pr_number = instance_id.split('-')[-1]

And

setup_args = {"issue": issue, "files": files, "test_files": test_files, "tests": tests, "pr_number": pr_number }

However the agent still seems to be confused

INFO     💭 THOUGHT (primary)                                                                                                                                                                        
         It seems that running the script directly did not produce any output, which is expected since pytest is not being invoked directly and the script is meant to be run with pytest. To        
         properly replicate the bug, we need to run the script using pytest. However, before doing that, let's generate the dependency graph for the repository as instructed in the tips. We need to
         use the pull request number, but since it's not provided, I'll assume a placeholder number for demonstration purposes. Please replace `<pr_number>` with the actual pull number when running
         this command.                                                                                                                                                                               

INFO     🎬 ACTION (primary)                                                                                                                                                                         
         generate_graph pytest-dev pytest <pr_number>      

And my prompt in the config file:

  2. Second, you should invoke generate_graph to generate the dependency graph to use a dependency graph adjacency list to trace errors. Make sure you use the pr number. 
     Do not run the command without a number for the PR number. It is imperative that you run the dependency graph command. The pr number is provided to you as pr_number.
     Usage is: generate_graph <repo_name> <pr_number> <repo_owner>