paul-gauthier / aider

aider is AI pair programming in your terminal
https://aider.chat/
Apache License 2.0
18.95k stars 1.75k forks source link

Evaluate performance against SWE-Bench #533

Closed 0xdevalias closed 3 months ago

0xdevalias commented 5 months ago

It would be interesting to see if/how aider performs against the SWE-Bench benchmarks:

paul-gauthier commented 5 months ago

Thanks for trying aider and filing this issue.

I've spent some time evaluating SWE-Bench, and have concerns that a large fraction of the tasks are essentially impossible. I've opened an issue in their repo about this, but haven't heard any response.

https://github.com/princeton-nlp/SWE-bench/issues/72

They recently released SWE-Bench Lite, which may address this. I need to dig in here.

https://www.swebench.com/lite.html

kithib commented 3 months ago

hello, could you tell me how to call aider to test swe-bench

paul-gauthier commented 3 months ago

@kithib The benchmark harness that I've been using probably isn't tidy enough for other folks to use. I hope to publish it soon though.

paul-gauthier commented 3 months ago

I think this issue is now resolved as aider now sits atop the SWE Bench Lite leaderboard.

https://aider.chat/2024/05/22/swe-bench-lite.html

I'm going to close this issue for now, but feel free to add a comment here and I will re-open or file a new issue any time.