Closed 0xdevalias closed 3 months ago
Thanks for trying aider and filing this issue.
I've spent some time evaluating SWE-Bench, and have concerns that a large fraction of the tasks are essentially impossible. I've opened an issue in their repo about this, but haven't heard any response.
https://github.com/princeton-nlp/SWE-bench/issues/72
They recently released SWE-Bench Lite, which may address this. I need to dig in here.
hello, could you tell me how to call aider to test swe-bench
@kithib The benchmark harness that I've been using probably isn't tidy enough for other folks to use. I hope to publish it soon though.
I think this issue is now resolved as aider now sits atop the SWE Bench Lite leaderboard.
https://aider.chat/2024/05/22/swe-bench-lite.html
I'm going to close this issue for now, but feel free to add a comment here and I will re-open or file a new issue any time.
It would be interesting to see if/how
aider
performs against the SWE-Bench benchmarks: