Closed rickyloynd-microsoft closed 1 month ago
@rickyloynd-microsoft, @afourney is looking into the prevalence and mitigations of endless loops and other failure modes as part of the robustness workstream. If you have good test cases, perhaps it can go into the test suite: https://github.com/microsoft/autogen/issues/306
As for getting to the right level of generalization in recipes and recipe sub-tasks, I wonder if we can leverage some measures of code complexity or maintainability to help. Human verification and editing could also be an option.
Yes please send examples my way.
I mostly see this in groupchat
Stale. No longer planned.
High-value skills tend to be complex and hierarchically composed of other skills. In addition, we need skills at all levels of the hierarchy to generalize to similar but somewhat different tasks. TeachableAgent can learn standalone skills, but cannot yet compose them into more complex skills.
Coding scenarios to be addressed
Failing on complex tasks AutoGen agents can already perform coding tasks of a certain level of complexity to an impressive degree, where AssistantAgent and UserProxyAgent iteratively write code, test it, check for errors, and revise the code to fix the errors. But for coding tasks that are a bit too complex, the agents tend to get stuck in endless loops without ever arriving at a solution.
Recipes The Alfie demo shows how sequences of skills can be packaged as recipes for subsequent retrieval and reuse. But these recipes are typically brittle in the sense that their sub-skills don’t generalize well to similar but different tasks. For example, @samershi points out that the recipe in agentchat_teaching.ipynb contains subtasks that don’t generalize well enough. The recipe for Analyzing and Visualizing Application Domains in arXiv Papers consists of steps defined strictly in terms of application domains and arXiv papers. These overly specific sub-skills wouldn’t help for a subsequent task of, for instance, Analyzing and Visualizing Topics in Wikipedia Articles.