plandex-ai / plandex

AI driven development in your terminal. Designed for large, real-world tasks.
https://plandex.ai
GNU Affero General Public License v3.0
10.77k stars 751 forks source link

Benchmark on SWE-Bench #74

Open distbit0 opened 7 months ago

distbit0 commented 7 months ago

It would be interesting to see measure the performance on SWE-Bench benchmarks, so that this project can be more clearly differentiated from the increasing number of other coding agents.

danenania commented 7 months ago

Agreed, it would be interesting to see the results if anyone wants to try.

That said, I'm guessing it might not do particularly well at this point since my focus so far has been much more on enabling a tight feedback loop, productive collaboration, and quick iteration between the developer and LLM vs. doing tasks end-to-end with the LLM autonomously in a single shot. But now that the former is working well, it makes sense to start shifting more toward the latter, so stay tuned on that :)

rodion-m commented 4 months ago

Yeah, aider has recently done well in the SWE benchmark. It's interesting to see the results of plandex.