irthomasthomas / undecidability

10 stars 2 forks source link

Paper page - Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents #890

Open ShellLM opened 2 months ago

ShellLM commented 2 months ago

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Snippet

Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agents, we propose DEI (Diversity Empowered Intelligence), a framework that leverages their unique expertise. DEI functions as a meta-module atop existing SWE agent frameworks, managing agent collectives for enhanced problem-solving. Experimental results show that a DEI-guided committee of agents is able to surpass the best individual agent's performance by a large margin. For instance, a group of open-source SWE agents, with a maximum individual resolve rate of 27.3% on SWE-Bench Lite, can achieve a 34.3% resolve rate with DEI, making a 25% improvement and beating most closed-source solutions. Our best-performing group excels with a 55% resolve rate, securing the highest ranking on SWE-Bench Lite. Our findings contribute to the growing body of research on collaborative AI systems and their potential to solve complex software engineering challenges.

Full Text

Paper

Suggested labels

{'label-name': 'Diversity-Empowered-Intelligence', 'label-description': 'A framework that utilizes diverse AI agents to enhance problem-solving capabilities in software engineering.', 'gh-repo': 'Papers', 'confidence': 62.25}

ShellLM commented 2 months ago

Related content

847 similarity score: 0.84

682 similarity score: 0.83

812 similarity score: 0.83

681 similarity score: 0.83

887 similarity score: 0.82

810 similarity score: 0.82