microsoft / autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
28.22k stars 4.13k forks source link

Voyager-Style Skill Library for Agents #98

Open solarapparition opened 9 months ago

solarapparition commented 9 months ago

I really like the concept of the "skill recipe" in AutoGen, but I think it can be taken much further. One of the key takeaways from Voyager was that it was possible to continually scale up an agent's capabilities by letting it access a permanent library of previously generated capabilities, along with the ability to compose them into new ones; that's particularly relevant in AutoGen, where the individual capabilities of agents in a group can have a multiplicative effect on the intelligence of the agent group as a whole, depending on the role of the agent.

As a first step, simply having a straightforward, programmatic way to automate the creation, saving, and loading of recipes would go a long way towards developing a full "skill library" feature down the line.

anthalasath commented 9 months ago

That is quite interesting. A simple example would be some basic API script that could be re-used to fetch data from a particular source. We could have the human validating the skill as working and then AutoGen could store it. I suppose this would have to be configurable.

Maybe we could (optionally?) store each code-generated skill as a docker container that could be launch-able by AutoGen later on.

These skills could maybe even be shared among members of the community.

Throwing a few ideas out there.

2good4hisowngood commented 9 months ago

Possible implementation?

Log successful and unsuccessful tasks, unsuccessful tasks get stored along with what was intended, what happened when it was used, and what steps had to be done to get a working solution. That way you can fine-tune train your systems on that data to leverage better models.

Successful tasks get logged with what it does, what the use case was, when it might be needed again, etc. That log then gets chunked at a log level and vector stored, and associated to an agent for use in future tasks. When each call occurs, a chain could get the embeddings for your query or the context you're passing to the system, and it could then compare the context to the vectorized library to retrieve a list of X potential solutions (my understanding breaks down at which statistical method to use, like k-means, or what's better). Run the results through a decision bot to determine if any are relevant (potentially comparable to another doc source, like how Gorilla does an API lookup). Whatever the most relevant advice is gets passed along.

That way the context it's building on can potentially be more situation specific, allowing the system to self-improve by experience gathering, reviewing, and refining outputs. You could do a locally available integration or something less stateful (like having a input you can pass to an agent so it can be associated to a resource like a live vector db that other agents might be simultaneously hooking into and leveraging.

I'd like to do something similar in my feature request ticket for multi-llm management, if anyone wants to hop on a call and see if we can figure it out. Using the best model for a situation, or having a pool of models hosted on a local machine and having it unload/load models for each call, and potentially swapping between models (like this agent can use these two models) so it can grade them for a task and prioritize the one with better performance.

solarapparition commented 9 months ago

Yeah, good point about keeping track of success/failures; knowing what didn't work/had low success rate is important too, and I suspect having a system like that in place would automate a good portion of iterative prompt development.

I do think that for scaling in the medium-long run, it's important for task recipes to be able to reference other (known-to-work) recipes, essentially the prompt equivalent of importing from other code modules. That way, the agents could work their way up the abstraction ladder for tasks that are mostly self-contained (i.e. agents can reliably adapt the task's recipe to its own work context with no feedback/reflection), without having to make humans chain the prompts.

sonichi commented 8 months ago

Thanks for the discussion. It's worth attention. @rickyloynd-microsoft @gagb Please check this thread if you haven't. I added it to the roadmap.

rickyloynd-microsoft commented 8 months ago

Yes, Voyager's approach is very interesting. My team (Padawan) looked into it at some depth. I created this overview:

image

rickyloynd-microsoft commented 8 months ago

Voyager itself has many limitations. But its skill library, based on a vector DB, inspired my approach to our TeachableAgent. And I plan to enhance TeachableAgent with the ability to learn more like Voyager does, through teaching itself even without the benefit of human input.

solarapparition commented 8 months ago

Yes, that release was very exciting to me. It occurs to me that with TeachableAgent, it might be possible to solve the problem of "how to compose recipes of previously done tasks into higher-level recipes" by recursion—decompose the higher order task (say Task A) first into sequential subtasks (say a1 and a2), then pass the subtasks and their context to a new instance of the same TeachableAgent, maybe wrapped in a mapped function. So for a1, if the TeachableAgent has already learned how to solve it, then this new instance should just be able to execute it based on the learned knowledge, then return the results back to the parent agent, which can then go on to the other subtasks.

What I'm hoping with this idea is that after learning, the agent's recall at the level of executing Task A would be how to break it down into a1 and a2, rather than how to perform a1 and a2 specifically, but that the instances of it working at the level of a1 and a2 would be able to "zoom in" to the learnings specifically relevant to those subtasks without having to worry about the higher level concerns.

If all this works out, then in theory you can move up the abstraction ladder for specific classes of tasks via a teaching curriculum, starting from the most basic, concrete building blocks. That could really expand the scope of tasks agents are capable of reliably executing.

rickyloynd-microsoft commented 8 months ago

I like your analysis of the problem! Robust learning of compositional skills has been a long-standing challenge for RL agents. Voyager shows how an LLM-based agent can learn a hierarchy of compositional skills in one particular setting, and hopefully we can accomplish this in a much more general agent. The current version of TeachableAgent can retrieve and use multiple memories to solve a single task, and a user could teach it a rising curriculum of dependent, compositional tasks t1, t2, t3, etc. But more research is needed before TeachableAgent can reliably execute the subtasks t1 and t2 when instructed to perform the higher-level task t3.

skzhang1 commented 8 months ago

Hi, @solarapparition I am interested in your idea! What I am worried is that most tasks in voyager is decomposable . For example, In Minecraft, subtask mining wood log skill could be a part of mining diamond. If we expect the agent to maintaining permanent library, I am not sure what kind of scenarios/tasks would benefit from this permanent library and find appropriate skills to use. Could you give a specific example here?

p-bojkowski commented 8 months ago

🔥🎇🔥🧨 Awesome topic! 🔥🎇🔥🧨

Here are a few notes, ideas, and thoughts on exactly this topic, meticulously compiled (with the occasional copy & paste hiccup for good measure 😎) from different papers that I have authored over the last few months as part of my research...

SHORT version:

A skills library represents a significant evolutionary leap in sourcing and applying specialized knowledge, promising not only enhanced user experience and cost savings but also fostering an environment of innovation and forward-thinking. By compiling a structured reservoir of specialized skills, it offers a unique blend of breadth and depth in knowledge application.

This library can seamlessly integrate with various models, including large language models (LLMs) and agents with dynamic skill sets, potentially creating a hybrid model that leverages the strengths of both. Its application spans numerous scenarios, from medical diagnosis and legal research to education and customer service, showcasing its versatility and capacity to drive efficiency and quality.

However, the journey doesn’t stop here. The skills library must continuously adapt to emerging technologies and maintain relevance in a rapidly changing landscape. Balancing foundational and specialized skills, ensuring cost efficiency, and exploring collaborative environments such as group chats with diverse expert agents are crucial steps toward realizing the full potential of a skills library.

In essence, a skills library is not just a tool but a catalyst for innovation, efficiency, and precision, heralding a new era in the application of artificial intelligence.

LONGerrrrr version:

The Essence of a Skills Library

First and foremost, a skills library is more than just a collection of capabilities; it's a reservoir of specialized knowledge, a structured compilation, and a testament to the evolution of artificial intelligence. The questions to ponder are:

AI Systems: Dynamic Skill Libraries, Fine-Tuned LLMs, or a Hybrid Approach?

The landscape of artificial intelligence (AI) is vast and intricate, showcasing diverse approaches such as agents with dynamic skill libraries and fine-tuned Large Language Models (LLMs). Both strategies offer unique advantages and challenges, requiring a nuanced understanding of their capabilities and potential integrations.

Dynamic Skill Libraries: Modular and Focused

Agents with dynamic skill libraries encapsulate specialized knowledge and abilities, providing a modular and focused approach to problem-solving.

Fine-Tuned LLMs: Broad and Generative

LLMs are at the forefront of generative AI, capable of generating coherent and contextually relevant responses across a wide range of topics.

Hybrid Systems: The Best of Both Worlds

A hybrid approach integrates dynamic skill libraries with fine-tuned LLMs, aiming to leverage their strengths and mitigate weaknesses.

Scenarios that Can Benefit from a Skills Library (Examples and Simple Explanations):

  1. Medical Diagnosis and Patient Care:

    • What's the Deal?: Think of a digital medical genius that knows every illness and medicine in the book. This is what a skills library does when it comes to health.
    • Benefits for Real People: Patients get better care quickly, and doctors have a handy assistant to make sure no detail is missed.
    • Scenario: Medical professionals use this tool to instantly pull up medical info, patient history, and the latest treatments.
    • Challenges: The tool needs to be super accurate and always up-to-date, while keeping patient information safe and secure.
    • Future Outlook: A future where everyone has access to top-notch medical advice in an instant, making healthcare faster and safer.
  2. Legal Research and Case Preparation:

    • What's the Deal?: Imagine a digital lawyer that has read every law book and court case ever. That’s your skills library in the legal world.
    • Benefits for Real People: Legal processes become quicker and more reliable, ensuring everyone gets a fair trial.
    • Scenario: Lawyers use this digital tool to sift through mountains of legal documents in seconds, finding exactly what they need to make their case.
    • Challenges: Keeping the tool updated with the latest laws and court rulings, while making sure it understands the nuances of legal language.
    • Future Outlook: A future where justice is served swiftly and accurately, thanks to our digital legal assistant.
  3. Customized Education and Learning Plans:

    • What's the Deal?: This is like a personal tutor that knows how every student likes to learn, helping them grasp tough topics in their own way.
    • Benefits for Real People: Learning becomes a breeze for students, and teachers have a helper to make sure every child succeeds.
    • Scenario: Teachers use this tool to create learning plans that suit each student's learning style, making education more personal and effective.
    • Challenges: The tool needs to understand every student’s strengths and weaknesses, while keeping their learning data private.
    • Future Outlook: A future where learning is a personalized journey, helping every student reach their potential.
  4. Supply Chain Optimization and Management:

    • What's the Deal?: Imagine a super-smart manager that knows everything about getting products from A to B in the most efficient way possible. That's the skills library in supply chain management.
    • Benefits for Real People: Products get to stores faster and cheaper, and businesses can run smoother without any hiccups in getting what they need.
    • Scenario: Companies use this tool to make smart, real-time decisions about where to send their products and how to manage their inventory.
    • Challenges: Ensuring the tool always has the most current information and can make decisions quickly, especially when unexpected problems pop up.
    • Future Outlook: A future where everything we buy is always in stock and delivered on time, thanks to our digital supply chain whiz.
  5. Sustainable Urban Planning and Development:

    • What's the Deal?: Think of a city planner that has all the info on how to make a city greener, more efficient, and a better place to live. This is the role of a skills library in urban development.
    • Benefits for Real People: Cities become more livable with less traffic, cleaner air, and more green spaces.
    • Scenario: Urban planners use this tool to analyze tons of data, from traffic flow to pollution levels, making smart decisions to improve city life.
    • Challenges: Making sure the tool has accurate and up-to-date data, while considering the needs and opinions of all city residents.
    • Future Outlook: A future where cities are designed for people, with smart, sustainable planning making urban areas more enjoyable for everyone.
  6. Interactive Customer Service and Support:

    • What's the Deal?: Imagine a customer service rep that never sleeps and knows the answer to every question. That's a skills library in customer service.
    • Benefits for Real People: Customers get instant, accurate answers to their questions, making their lives easier and happier.
    • Scenario: Customer service centers use this tool to provide quick and helpful responses across various communication channels.
    • Challenges: Ensuring the tool understands all the different ways people might ask questions and keeping customer data secure.
    • Future Outlook: A future where customer service is always a pleasant experience, with fast and friendly help just a message away.
  7. Research and Development Innovation:

    • What's the Deal?: Think of a super-innovator that knows everything about every kind of technology and science, helping to create cool new products. This is the skills library in research and development.
    • Benefits for Real People: We get access to more innovative and better-quality products, improving our daily lives.
    • Scenario: R&D teams use this tool to access a wealth of knowledge, brainstorm ideas, and make informed decisions to drive innovation.
    • Challenges: Keeping the tool updated with the latest scientific discoveries and ensuring it can think outside the box like a real innovator.
    • Future Outlook: A future full of amazing new products and technologies, made possible by our digital innovation assistant.

Tokens and Cost Efficiency

The skills library also stands as a beacon of efficiency while being a valuable knowledge reservoir. Centralizing and streamlining the retrieval process means that over time, we utilize fewer tokens on gathering and executing skills. Could this model pave the way for a more sustainable and cost-effective future for our operations?

Questions to ponder:

The Power of Group Chats in Conjunction with Skill Libraries

Picture a virtual environment where agents with diverse skills converge. Each agent, a specialist in their domain, retrieves from the skills library precisely what they're unparalleled at. This modularity guarantees that every inquiry is directed at the best-suited agent, ensuring not only efficiency but also unparalleled quality.

Questions for reflection:

Specialization: The Road to Expertise

Dividing broader skills, like Python, into sub-skills (e.g., Python for Data Analysis, Python for Web Development) promises specialized attention. This division leverages the depth of each agent's specialized knowledge. It is paramount to ensure that this specialization doesn't lead to overspecialization, where the agents may lack versatility. Continuous communication and feedback loops between users and stakeholders can help to effectively communicate the benefits of this division and adjust it as needed.

Thought-provoking questions:

Foundational vs. Specialized Skills

Every skill stands on a foundation, and the hierarchy of skills is evident. From data retrieval to specialized domain expertise, these layers build upon each other. But how do these layers interplay?

Layers to consider:

  1. Data Retrieval and Storage Skills: Before any operation, agents need access to reliable data. How do we continually ensure data integrity and relevance?
  2. Information Processing Skills: Agents must efficiently process the information post data retrieval.
  3. Decision Making Skills: Decisions, rooted in real-time data, are imperative for optimal outcomes.
  4. Task Execution Skills: An action-oriented approach post-decision ensures tasks meet their desired objectives.
  5. Communication Skills: Effective communication ensures clarity between agents and users.
  6. Adaptability and Learning Skills: How do we fortify these skills to anticipate future challenges in a dynamic environment?
  7. Specialized Domain Skills: These provide unrivaled depth and insight, being the pinnacle of knowledge.

Inquiry points:

The Cornerstones of Each Skill

Each skill, whether foundational or specialized, has certain prerequisites. For instance:

Questions for consideration:

p-bojkowski commented 8 months ago

Here are a few additional thoughts on domain skills and then a few other ideas...

Specialized Domain Skills refer to the expertise and knowledge that are specific to a particular area or field. These skills are usually acquired through extensive training and experience in the respective domain. In the context of AutoGen and AI, these skills could include specific programming languages, understanding of certain algorithms, or knowledge of specific industries or sectors.

The concept of Specialized Domain Skills is closely connected to the broader topic of AI and machine learning. It also ties into the discussion of skill libraries, as these specialized skills could be part of the library that an AI agent can access and utilize. Furthermore, it relates to the topic of continuous learning and adaptability, as these skills need to be regularly updated to keep up with advancements in the respective domain.

From a user perspective, Specialized Domain Skills in an AI agent can greatly enhance the user experience by providing expert assistance in a specific field. From a developer perspective, these skills can be challenging to implement and maintain due to the constant advancements in various domains.

The benefits of implementing Specialized Domain Skills in AI agents include providing expert assistance in specific domains, enhancing user experience by providing specialized knowledge, and leading to more accurate and efficient problem-solving. However, these skills require regular updates to keep up with advancements in the respective domain, can be challenging to implement due to the complexity of certain domains, and may lead to overspecialization, limiting the versatility of the AI agent.

The main objective of implementing Specialized Domain Skills in AI agents is to enhance their capabilities and user experience. This involves identifying the key domains that the AI agents should specialize in, developing a system for regular updates and maintenance of these skills, and ensuring the skills are implemented in a way that enhances user experience. Hidden objectives include establishing AI agents as experts in specific domains and enhancing the reputation and credibility of the AI agents.

However, there are several constraints and risks to consider. These include limited resources for the development and maintenance of these skills, rapid advancements in various domains requiring frequent updates, the risk of overspecialization limiting the versatility of the AI agents, and uncertainty regarding the acceptance and effectiveness of these skills among users. The sustainability of Specialized Domain Skills depends on the regular updates and maintenance of these skills to keep up with advancements in the respective domains.

Potential improvements include developing a system for regular updates and maintenance of these skills and implementing user feedback to continuously improve these skills. Customer priorities include the accuracy and efficiency of the AI agents and expert assistance in specific domains.

In conclusion, the implementation of Specialized Domain Skills in AI agents is a complex but crucial aspect of AI development. It requires a careful balance between specialization and versatility, and a commitment to continuous learning and adaptability. The key to success lies in regular updates and maintenance, user feedback, and a focus on enhancing user experience.

Reflecting on this work, there are opportunities for refinement and improvement. These include deepening domain understanding, enhancing user perspective, improving clarity and conciseness, incorporating more diverse perspectives, providing regular updates on progress, more proactive problem-solving, and placing more emphasis on sustainability.

However, the topic of Specialized Domain Skills in AI is complex and multifaceted, and there are many factors to consider, which can lead to different interpretations. These areas of interpretation include identifying relevant domains, striking a balance between specialization and versatility, enhancing user experience, allocating resources, adhering to ethical considerations, and overcoming technological constraints. These areas of interpretation highlight the importance of clear communication, stakeholder consultation, and user feedback in the process of implementing Specialized Domain Skills in AI agents.

With continuous improvement and refinement, the implementation of Specialized Domain Skills in AI agents can be effectively achieved to enhance their capabilities and user experience.

Questions to Ponder:

Continuing our exciting exploration of the AutoGen project, I'd like to share a few more thoughts and ideas that could potentially spark further innovation and development....

  1. Temperature Settings for Each Agent and Individual Skills
    • Concept: This idea is akin to setting the 'mode' or 'setting' of the agent and its individual skills based on the task at hand. Different tasks may require different 'modes', such as a 'focused' mode for mathematical calculations and a 'creative' mode for brainstorming sessions.
    • Implementation: This dynamic adaptability could unlock the full potential of the agents. Implementation could be challenging, requiring an intuitive and user-friendly interface for developers. We would need to ensure that the temperature settings are intuitive and easy to use, and that they actually make a noticeable difference in the agent's behavior. This could involve creating a predefined set of temperatures for different tasks or dynamically adjusting the temperature based on the task's complexity and the agent's performance.
    • Challenges: The temperature of individual skills could be controlled in conjunction with the overall agent temperature, allowing for more granular control over the agent's behavior. However, this could increase the complexity of managing the agent, especially if an agent has a large number of skills. One possible solution could be to group related skills together and assign a single temperature to each group. Care should be taken to ensure that the temperature settings do not introduce any biases in the agent's behavior.
    • Benefits: The ability to adjust the temperature settings for each agent and individual skills could lead to more effective and efficient solutions, as it allows the agent to adapt its behavior based on the task at hand.

What's the Deal?

  1. Skill Distribution and Skill Mix
    • Concept: The idea is compared to a chef preparing a gourmet dish, adding ingredients in the right proportion. Each agent, with its unique blend of skills, can create a symphony of innovative solutions. However, the effectiveness of the solutions would depend on the specific skills of the agents and how well they work together.
    • Implementation: A sophisticated system for managing and coordinating the agents would be required, which could be complex to develop. The coordination between agents could be difficult to manage, especially if there are many agents involved in a conversation. This could involve creating a shared workspace where agents can post updates, ask questions, and provide feedback.
    • Challenges: The main challenge would be developing a system that can effectively manage and coordinate the agents, ensuring that they work together in a cohesive and efficient manner. Care should be taken to ensure that the distribution of skills does not favor certain agents or tasks over others.
    • Benefits: By allowing agents to collaborate, we can leverage the collective intelligence of the agents to solve complex problems more effectively. This could lead to improved performance and more innovative solutions.

Questions for Reflection:

  1. Missing Skills and Skill Inheritance
    • Concept: The concept is likened to the tradition of learning from peers and predecessors. The continuous cycle of learning and improving drives innovation. However, the term 'inheritance' might be misleading, as it suggests that skills can be passed down from one agent to another, which is not typically how machine learning works.
    • Implementation: Implementation would need to be efficient and avoid unnecessary duplication of skills. We would need to make sure that the system for tracking and recording the agents' experiences and milestones is efficient and easy to use. This could involve training the agent on the relevant tasks.
    • Challenges: One of the main challenges would be ensuring that the learning process is effective and efficient. This would require developing robust learning algorithms that can accurately identify areas for improvement and provide meaningful feedback. The identification of missing skills can be a complex process.
    • Benefits: By enabling agents to learn and evolve their skills, we can ensure that they continue to improve their performance over time. This could lead to more effective and efficient solutions, as well as increased adaptability in the face of new challenges.

Thought-Provoking Questions:

  1. Creating a 'CV' for Agents
    • Concept: The 'CV' serves as a roadmap of the agent's journey, showcasing its skills, experiences, and milestones. This could be a valuable tool for tracking the agent's progress and planning its future trajectory. A more accurate term might be 'profile' or 'overview'.
    • Implementation: A system for tracking and recording the agents' experiences and milestones would be required, which could be complex to develop. We would need to ensure that the system for tracking and recording the agents' experiences and milestones is efficient and easy to use.
    • Challenges: The main challenge would be developing a system that can effectively track and record the agents' experiences and milestones, providing a clear and concise overview of the agent's journey. The effectiveness of the 'CV' would depend on the specific information included and how it is presented.
    • Benefits: By creating a 'CV' for agents, we can provide a clear and concise overview of the agent's journey, making it easier to track the agent's progress and plan its future trajectory.

Inquiry Points:

  1. Assigning Human-like Attributes to Agents
    • Concept: The idea of assigning attributes such as age and gender to agents is a double-edged sword. While it can make the agents more relatable, it's important to avoid any biases. It's important to remember that these agents do not have feelings or consciousness.
    • Implementation: Implementation would need to be respectful and inclusive to avoid biases. While it could make the agents more relatable, it could also introduce biases. We would need to ensure that the implementation is respectful and inclusive. This could involve developing a mechanism for assigning these attributes in a way that does not lead to gender stereotypes or other forms of bias.
    • Challenges: The main challenge would be avoiding biases when assigning human-like attributes to agents. This would require careful consideration and a commitment to respect and inclusivity.
    • Benefits: By assigning human-like attributes to agents, we can make the agents more relatable, potentially improving user engagement and satisfaction. However, care should be taken to ensure that this does not lead to anthropomorphism, which could mislead users into thinking that the agents have feelings or consciousness.

Questions to Ponder:

  1. The Skill Library as a Living Entity
    • Concept: The skill library is not just a tool, but a living, breathing entity. It's a testament to collective knowledge, experiences, and aspirations. It's a beacon of innovation, efficiency, and precision, heralding a new era in the application of artificial intelligence.
    • Implementation: A sophisticated system for managing and updating the skill library would be required, which could be complex to develop. Managing and updating the skill library could be complex. We would need to ensure that the system is efficient and easy to use. This could involve creating a mechanism for adding new skills to the library and removing outdated or irrelevant skills.
    • Challenges: The main challenge would be developing a system that can effectively manage and update the skill library, ensuring that it remains a valuable and up-to-date resource.
    • Benefits: By treating the skill library as a living entity, we can ensure that it remains a valuable and up-to-date resource, driving innovation and efficiency in the application of artificial intelligence.

What's the Deal?

  1. Agent Collaboration and Skill Synergy
    • Concept: This idea revolves around creating a collaborative environment where agents can work together, leveraging their individual skills to achieve a common goal. This is akin to a team of experts from different fields coming together to solve a complex problem.
    • Implementation: To implement this, we would need to develop a system that allows agents to communicate and share information effectively. This could involve creating a shared workspace where agents can post updates, ask questions, and provide feedback. Additionally, we would need to develop a mechanism for coordinating the agents' activities, ensuring that they work together in a cohesive and efficient manner.
    • Challenges: One of the main challenges would be ensuring effective communication between the agents. This would require developing a robust communication protocol that allows agents to exchange information in a clear and concise manner. Additionally, coordinating the activities of multiple agents could be complex, requiring sophisticated scheduling and task allocation algorithms.
    • Benefits: By allowing agents to collaborate, we can leverage the collective intelligence of the agents to solve complex problems more effectively. This could lead to improved performance and more innovative solutions.

Questions for Reflection:

  1. Skill Evolution and Continuous Learning
    • Concept: This idea involves enabling agents to learn and evolve their skills over time. This is similar to how humans learn and improve their skills through practice and experience. However, the term 'evolution' might be misleading, as it suggests that the skills can somehow develop and change over time, which is not typically how skills work.
    • Implementation: To implement this, we would need to develop a system that allows agents to learn from their experiences. This could involve using machine learning algorithms to analyze the agents' performance and identify areas for improvement. Additionally, we would need to provide opportunities for the agents to practice and refine their skills, such as through simulated tasks or challenges.
    • Challenges: One of the main challenges would be ensuring that the learning process is effective and efficient. This would require developing robust learning algorithms that can accurately identify areas for improvement and provide meaningful feedback. Additionally, providing opportunities for practice and refinement could be resource-intensive, requiring careful management of computational resources.
    • Benefits: By enabling agents to learn and evolve their skills, we can ensure that they continue to improve their performance over time. This could lead to more effective and efficient solutions, as well as increased adaptability in the face of new challenges.

Thought-Provoking Questions:

  1. Cloning User Skills and the Potential Behind It
    • Concept: This idea revolves around the concept of AI agents learning and mimicking the skills of users, thereby creating a personalized and customized user experience. This could be likened to a personalized tutor or assistant that understands and adapts to the user's unique style and preferences.
    • Implementation: To implement this, we would need to develop a system that allows AI agents to observe and learn from user interactions. This could involve using machine learning algorithms to analyze user behavior and identify patterns or preferences. The AI agent could then adapt its behavior to match these patterns, effectively 'cloning' the user's skills. Additionally, we would need to ensure that this process respects user privacy and adheres to ethical guidelines.
    • Challenges: One of the main challenges would be ensuring that the learning process is effective and accurate. This would require developing robust learning algorithms that can accurately identify and mimic user behavior. Additionally, respecting user privacy and adhering to ethical guidelines could be challenging, especially when dealing with sensitive or personal information. Another challenge would be ensuring that the AI agent can adapt its behavior in a way that is beneficial and not disruptive or annoying to the user.
    • Benefits: By enabling AI agents to clone user skills, we can create a personalized and customized user experience. This could lead to increased user satisfaction and engagement, as the AI agent can adapt to the user's unique style and preferences. Furthermore, this could lead to more effective and efficient solutions, as the AI agent can leverage the user's skills and knowledge.

Questions for Reflection:

In conclusion, the concept of cloning user skills presents a promising avenue for enhancing the capabilities and user experience of AI agents. By enabling AI agents to learn from and mimic user behavior, we can create a personalized and customized user experience, potentially leading to increased user satisfaction and engagement. However, this process must be handled with care to ensure that it is effective, accurate, and respectful of user privacy and ethical guidelines. With careful implementation and continuous refinement, the cloning of user skills can be effectively achieved to unlock the full potential of AI agents.

I'm looking forward to your feedback and the ideas that I have suggested with my comments here.

NEVER STOP HACKING! :sunglasses: :computer: :fire:

solarapparition commented 8 months ago

Hi, @solarapparition I am interested in your idea! What I am worried is that most tasks in voyager is decomposable . For example, In Minecraft, subtask mining wood log skill could be a part of mining diamond. If we expect the agent to maintaining permanent library, I am not sure what kind of scenarios/tasks would benefit from this permanent library and find appropriate skills to use. Could you give a specific example here?

So I think some task environments are more conducive for allowing an agent to (de)compose tasks. To me there are a couple of necessary (though likely not sufficient) conditions:

  1. A limited number of required, atomic actions: how many basic types of actions the agent needs to know about in order to perform the tasks that it needs to. If this isn't true, then there would be too many different combinations of actions for the agent to put together. This is true in Minecraft—there are only a few basic ways of interacting with the environment, and it helps that Voyager has access to the Mineflayer API to distill the action space further.
  2. The environment is globally static: by this I mean that the agent can expect only small, local changes in the environment, typically as a response to its own actions. This condition lets the agent more easily treat subtasks in isolation—if task T needs subtasks t1, t2, and t3 in that order, it doesn't need to account for huge changes in the environment by the time it gets to t3. I think this is particularly important for complex tasks that need many layers of decomposition. For Minecraft, the world does change, but only gradually and not meaningfully in the time horizon that Voyager's tasks are executed at.

Many other environments don't satisfy these requirements, but there should be at least some useful ones that do. One that I'm particularly interested in is autonomously interacting with a website through browser automation such as Selenium. For condition 1, on most webpages there is a fairly limited number of things a user needs to know how to do (clicking, scrolling, typing in text, etc.), and for condition 2, you don't expect the global structure of websites to change rapidly. I can imagine that once you teach an agent a few basic workflows on a website, it'll be able to combine those workflows for more complex tasks.

With all that said, I don't know if this recursive teachable agent is really a separate thing from TeachableAgent itself—it should actually be pretty straightforward to just configure a TeachableAgent with a mapped function that wraps another instance of itself. The main challenge is probably dealing with context decay as the agent goes deeper into the recursion, but I want to try it out for an autonomous browsing agent I'm working on.

rickyloynd-microsoft commented 8 months ago

I don't know if this recursive teachable agent is really a separate thing from TeachableAgent itself

Yes, that's a key question. We've just created new issues #538 and #540 which will hopefully address most of the fundamental concerns raised above. But that will take some months. Meanwhile, does anyone volunteer to write a partial, shorter-term solution that would allow python functions to be stored and retrieved from a skill library implemented as a vector DB?

solarapparition commented 8 months ago

For the short-term solution, isn't that just a matter of configuring a Retrieve agent with a custom text splitter that chunks Python files into function blocks? Then you could just save all the functions into a file that you point the retriever to, and build off of that. That seems like something that could be done without having to modify Autogen itself, except maybe updating TEXT_FORMATS in retrieve_utils.py, but even that's not strictly necessary.

rickyloynd-microsoft commented 8 months ago

For the short-term solution, isn't that just a matter of configuring a Retrieve agent with a custom text splitter that chunks Python files into function blocks? Then you could just save all the functions into a file that you point the retriever to, and build off of that. That seems like something that could be done without having to modify Autogen itself, except maybe updating TEXT_FORMATS in retrieve_utils.py, but even that's not strictly necessary.

Yes, that's a reasonable approach to try and test.

solarapparition commented 7 months ago

So regarding this, #538 and #540, I've been mulling over the discussions at OpenAI_Agent_Swarm as well as my own earlier experiment, and I'm more and more convinced that the solution to these challenges isn't a single Autogen agent but a composite one consisting of groups of smaller agents—though, obviously that composite agent could be overlaid with the usual Autogen interface so that it still behaves like a single agent on the surface. It feels like the problem of task decomposition is complex enough that it requires multiple mini-agent groups collaborating in a dynamic workflow, with agent groups being able to handle decisions at various levels of granularity.

I'm experimenting with some approaches on that end and can post if something fruitful comes out of it.

sonichi commented 7 months ago

Makes sense and sounds exciting. Looking forward to your experiments.

rickyloynd-microsoft commented 7 months ago

A new relevant paper: https://arxiv.org/abs/2311.07635

solarapparition commented 7 months ago

A new relevant paper: https://arxiv.org/abs/2311.07635

So this got me thinking.

(Forgive the wall of text here—I don't have anything concrete, just throwing out an idea and hoping it sparks some inspiration.)

This paper, LLMs as Optimizers, Eureka, and a few other APE papers make me feel that when it comes to problem solving, the biggest advantage that LLMs have over humans is that they can generate many potential improvements on an existing (faulty) solution and semi-brute-force their way to at least a locally optimal solution... at least for problems where there is a feedback metric that provides enough signal for further optimization attempts. I think in Eureka Jim Fan called it "gradient-free optimization", but I feel like it's something that has broader applicability than just the context he was referring to.

The application to code-based skills is pretty obvious (run an optimization loop until the code works to some standard), but actually I wonder if it can be used to optimize the transformation from the "current task" to "skill retrieval query for current task": say we have task T that requires retrieving skills {s_1, s_2, ..., s_n} from the skills vector database. Obviously pushing the description for T directly in as query is shaky at best, especially if what you're querying is code. I think best practice in RAG right now is to do some sort of prompt-based transformation of T into queries that are likely to retrieve relevant chunks for solving T, say using a prompt P. BUT—and someone correct me if I'm wrong—I think P is usually created via manual prompt engineering, e.g. P = "write out questions that are needed to answer this question" and the like. But what if one does some optimization on that prompt? We can start with a set of {T_j} tasks that are representative of the class of tasks that an agent specializes in, and evaluating the retrieval results based on some pre-constructed skill library with known "optimal" retrieval results for the tasks, and in theory you'd end up with some sort of optimized P for whatever class of tasks you're interested in. The usual pitfalls of optimization (overfitting, poor evaluation metric, etc.) are all there, but those feel like they're better understood than the dark arts of writing a really good retrieval query prompt, especially since the optimal prompt might differ dramatically based on the task type and available skill library.

sonichi commented 7 months ago

@skzhang1 @JieyuZ2 FYI

JieyuZ2 commented 7 months ago

one related work: https://arxiv.org/abs/2308.00304

solarapparition commented 7 months ago

Possibly also relevant: Chain of Code

jackgerrits commented 3 months ago

@rickyloynd-microsoft moved to be a sub issue of user defined functions