SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
I was able to see the SWEAgent results on the Devin subset online here. However, can you please share the commands/ parameters to run the eval on the 25% test set? Can it be made available as: https://huggingface.co/datasets/princeton-nlp/SWE-bench_25 or something like that? Thanks much!
Describe the issue
I was able to see the SWEAgent results on the Devin subset online here. However, can you please share the commands/ parameters to run the eval on the 25% test set? Can it be made available as:
https://huggingface.co/datasets/princeton-nlp/SWE-bench_25
or something like that? Thanks much!Suggest an improvement to documentation
No response