How do I run SWE-Agent inference on the 25% "Devin" subset ?

princeton-nlp / SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.

https://princeton-nlp.github.io/SWE-agent/

MIT License

13.31k stars 1.3k forks source link

How do I run SWE-Agent inference on the 25% "Devin" subset ? #272

Closed avisil closed 4 months ago

avisil commented 4 months ago

Describe the issue

I was able to see the SWEAgent results on the Devin subset online here. However, can you please share the commands/ parameters to run the eval on the 25% test set? Can it be made available as: https://huggingface.co/datasets/princeton-nlp/SWE-bench_25 or something like that? Thanks much!

Suggest an improvement to documentation

No response

avisil commented 4 months ago

Was able to make a copy of the data from the subset using the ids provided and using up the full 2294 dataset.