Open solarapparition opened 9 months ago
That is quite interesting. A simple example would be some basic API script that could be re-used to fetch data from a particular source. We could have the human validating the skill as working and then AutoGen could store it. I suppose this would have to be configurable.
Maybe we could (optionally?) store each code-generated skill as a docker container that could be launch-able by AutoGen later on.
These skills could maybe even be shared among members of the community.
Throwing a few ideas out there.
Possible implementation?
Log successful and unsuccessful tasks, unsuccessful tasks get stored along with what was intended, what happened when it was used, and what steps had to be done to get a working solution. That way you can fine-tune train your systems on that data to leverage better models.
Successful tasks get logged with what it does, what the use case was, when it might be needed again, etc. That log then gets chunked at a log level and vector stored, and associated to an agent for use in future tasks. When each call occurs, a chain could get the embeddings for your query or the context you're passing to the system, and it could then compare the context to the vectorized library to retrieve a list of X potential solutions (my understanding breaks down at which statistical method to use, like k-means, or what's better). Run the results through a decision bot to determine if any are relevant (potentially comparable to another doc source, like how Gorilla does an API lookup). Whatever the most relevant advice is gets passed along.
That way the context it's building on can potentially be more situation specific, allowing the system to self-improve by experience gathering, reviewing, and refining outputs. You could do a locally available integration or something less stateful (like having a input you can pass to an agent so it can be associated to a resource like a live vector db that other agents might be simultaneously hooking into and leveraging.
I'd like to do something similar in my feature request ticket for multi-llm management, if anyone wants to hop on a call and see if we can figure it out. Using the best model for a situation, or having a pool of models hosted on a local machine and having it unload/load models for each call, and potentially swapping between models (like this agent can use these two models) so it can grade them for a task and prioritize the one with better performance.
Yeah, good point about keeping track of success/failures; knowing what didn't work/had low success rate is important too, and I suspect having a system like that in place would automate a good portion of iterative prompt development.
I do think that for scaling in the medium-long run, it's important for task recipes to be able to reference other (known-to-work) recipes, essentially the prompt equivalent of importing from other code modules. That way, the agents could work their way up the abstraction ladder for tasks that are mostly self-contained (i.e. agents can reliably adapt the task's recipe to its own work context with no feedback/reflection), without having to make humans chain the prompts.
Thanks for the discussion. It's worth attention. @rickyloynd-microsoft @gagb Please check this thread if you haven't. I added it to the roadmap.
Yes, Voyager's approach is very interesting. My team (Padawan) looked into it at some depth. I created this overview:
Voyager itself has many limitations. But its skill library, based on a vector DB, inspired my approach to our TeachableAgent. And I plan to enhance TeachableAgent
with the ability to learn more like Voyager does, through teaching itself even without the benefit of human input.
Yes, that release was very exciting to me. It occurs to me that with TeachableAgent
, it might be possible to solve the problem of "how to compose recipes of previously done tasks into higher-level recipes" by recursion—decompose the higher order task (say Task A
) first into sequential subtasks (say a1
and a2
), then pass the subtasks and their context to a new instance of the same TeachableAgent
, maybe wrapped in a mapped function. So for a1
, if the TeachableAgent
has already learned how to solve it, then this new instance should just be able to execute it based on the learned knowledge, then return the results back to the parent agent, which can then go on to the other subtasks.
What I'm hoping with this idea is that after learning, the agent's recall at the level of executing Task A
would be how to break it down into a1
and a2
, rather than how to perform a1
and a2
specifically, but that the instances of it working at the level of a1
and a2
would be able to "zoom in" to the learnings specifically relevant to those subtasks without having to worry about the higher level concerns.
If all this works out, then in theory you can move up the abstraction ladder for specific classes of tasks via a teaching curriculum, starting from the most basic, concrete building blocks. That could really expand the scope of tasks agents are capable of reliably executing.
I like your analysis of the problem! Robust learning of compositional skills has been a long-standing challenge for RL agents. Voyager shows how an LLM-based agent can learn a hierarchy of compositional skills in one particular setting, and hopefully we can accomplish this in a much more general agent. The current version of TeachableAgent
can retrieve and use multiple memories to solve a single task, and a user could teach it a rising curriculum of dependent, compositional tasks t1
, t2
, t3
, etc. But more research is needed before TeachableAgent
can reliably execute the subtasks t1
and t2
when instructed to perform the higher-level task t3
.
Hi, @solarapparition I am interested in your idea! What I am worried is that most tasks in voyager is decomposable . For example, In Minecraft, subtask mining wood log skill could be a part of mining diamond. If we expect the agent to maintaining permanent library, I am not sure what kind of scenarios/tasks would benefit from this permanent library and find appropriate skills to use. Could you give a specific example here?
🔥🎇🔥🧨 Awesome topic! 🔥🎇🔥🧨
Here are a few notes, ideas, and thoughts on exactly this topic, meticulously compiled (with the occasional copy & paste hiccup for good measure 😎) from different papers that I have authored over the last few months as part of my research...
A skills library represents a significant evolutionary leap in sourcing and applying specialized knowledge, promising not only enhanced user experience and cost savings but also fostering an environment of innovation and forward-thinking. By compiling a structured reservoir of specialized skills, it offers a unique blend of breadth and depth in knowledge application.
This library can seamlessly integrate with various models, including large language models (LLMs) and agents with dynamic skill sets, potentially creating a hybrid model that leverages the strengths of both. Its application spans numerous scenarios, from medical diagnosis and legal research to education and customer service, showcasing its versatility and capacity to drive efficiency and quality.
However, the journey doesn’t stop here. The skills library must continuously adapt to emerging technologies and maintain relevance in a rapidly changing landscape. Balancing foundational and specialized skills, ensuring cost efficiency, and exploring collaborative environments such as group chats with diverse expert agents are crucial steps toward realizing the full potential of a skills library.
In essence, a skills library is not just a tool but a catalyst for innovation, efficiency, and precision, heralding a new era in the application of artificial intelligence.
First and foremost, a skills library is more than just a collection of capabilities; it's a reservoir of specialized knowledge, a structured compilation, and a testament to the evolution of artificial intelligence. The questions to ponder are:
The landscape of artificial intelligence (AI) is vast and intricate, showcasing diverse approaches such as agents with dynamic skill libraries and fine-tuned Large Language Models (LLMs). Both strategies offer unique advantages and challenges, requiring a nuanced understanding of their capabilities and potential integrations.
Agents with dynamic skill libraries encapsulate specialized knowledge and abilities, providing a modular and focused approach to problem-solving.
LLMs are at the forefront of generative AI, capable of generating coherent and contextually relevant responses across a wide range of topics.
A hybrid approach integrates dynamic skill libraries with fine-tuned LLMs, aiming to leverage their strengths and mitigate weaknesses.
Medical Diagnosis and Patient Care:
Legal Research and Case Preparation:
Customized Education and Learning Plans:
Supply Chain Optimization and Management:
Sustainable Urban Planning and Development:
Interactive Customer Service and Support:
Research and Development Innovation:
The skills library also stands as a beacon of efficiency while being a valuable knowledge reservoir. Centralizing and streamlining the retrieval process means that over time, we utilize fewer tokens on gathering and executing skills. Could this model pave the way for a more sustainable and cost-effective future for our operations?
Questions to ponder:
Picture a virtual environment where agents with diverse skills converge. Each agent, a specialist in their domain, retrieves from the skills library precisely what they're unparalleled at. This modularity guarantees that every inquiry is directed at the best-suited agent, ensuring not only efficiency but also unparalleled quality.
Questions for reflection:
Dividing broader skills, like Python, into sub-skills (e.g., Python for Data Analysis, Python for Web Development) promises specialized attention. This division leverages the depth of each agent's specialized knowledge. It is paramount to ensure that this specialization doesn't lead to overspecialization, where the agents may lack versatility. Continuous communication and feedback loops between users and stakeholders can help to effectively communicate the benefits of this division and adjust it as needed.
Thought-provoking questions:
Every skill stands on a foundation, and the hierarchy of skills is evident. From data retrieval to specialized domain expertise, these layers build upon each other. But how do these layers interplay?
Layers to consider:
Inquiry points:
Each skill, whether foundational or specialized, has certain prerequisites. For instance:
Questions for consideration:
Here are a few additional thoughts on domain skills and then a few other ideas...
Specialized Domain Skills refer to the expertise and knowledge that are specific to a particular area or field. These skills are usually acquired through extensive training and experience in the respective domain. In the context of AutoGen and AI, these skills could include specific programming languages, understanding of certain algorithms, or knowledge of specific industries or sectors.
The concept of Specialized Domain Skills is closely connected to the broader topic of AI and machine learning. It also ties into the discussion of skill libraries, as these specialized skills could be part of the library that an AI agent can access and utilize. Furthermore, it relates to the topic of continuous learning and adaptability, as these skills need to be regularly updated to keep up with advancements in the respective domain.
From a user perspective, Specialized Domain Skills in an AI agent can greatly enhance the user experience by providing expert assistance in a specific field. From a developer perspective, these skills can be challenging to implement and maintain due to the constant advancements in various domains.
The benefits of implementing Specialized Domain Skills in AI agents include providing expert assistance in specific domains, enhancing user experience by providing specialized knowledge, and leading to more accurate and efficient problem-solving. However, these skills require regular updates to keep up with advancements in the respective domain, can be challenging to implement due to the complexity of certain domains, and may lead to overspecialization, limiting the versatility of the AI agent.
The main objective of implementing Specialized Domain Skills in AI agents is to enhance their capabilities and user experience. This involves identifying the key domains that the AI agents should specialize in, developing a system for regular updates and maintenance of these skills, and ensuring the skills are implemented in a way that enhances user experience. Hidden objectives include establishing AI agents as experts in specific domains and enhancing the reputation and credibility of the AI agents.
However, there are several constraints and risks to consider. These include limited resources for the development and maintenance of these skills, rapid advancements in various domains requiring frequent updates, the risk of overspecialization limiting the versatility of the AI agents, and uncertainty regarding the acceptance and effectiveness of these skills among users. The sustainability of Specialized Domain Skills depends on the regular updates and maintenance of these skills to keep up with advancements in the respective domains.
Potential improvements include developing a system for regular updates and maintenance of these skills and implementing user feedback to continuously improve these skills. Customer priorities include the accuracy and efficiency of the AI agents and expert assistance in specific domains.
In conclusion, the implementation of Specialized Domain Skills in AI agents is a complex but crucial aspect of AI development. It requires a careful balance between specialization and versatility, and a commitment to continuous learning and adaptability. The key to success lies in regular updates and maintenance, user feedback, and a focus on enhancing user experience.
Reflecting on this work, there are opportunities for refinement and improvement. These include deepening domain understanding, enhancing user perspective, improving clarity and conciseness, incorporating more diverse perspectives, providing regular updates on progress, more proactive problem-solving, and placing more emphasis on sustainability.
However, the topic of Specialized Domain Skills in AI is complex and multifaceted, and there are many factors to consider, which can lead to different interpretations. These areas of interpretation include identifying relevant domains, striking a balance between specialization and versatility, enhancing user experience, allocating resources, adhering to ethical considerations, and overcoming technological constraints. These areas of interpretation highlight the importance of clear communication, stakeholder consultation, and user feedback in the process of implementing Specialized Domain Skills in AI agents.
With continuous improvement and refinement, the implementation of Specialized Domain Skills in AI agents can be effectively achieved to enhance their capabilities and user experience.
Questions to Ponder:
Continuing our exciting exploration of the AutoGen project, I'd like to share a few more thoughts and ideas that could potentially spark further innovation and development....
What's the Deal?
Questions for Reflection:
Thought-Provoking Questions:
Inquiry Points:
Questions to Ponder:
What's the Deal?
Questions for Reflection:
Thought-Provoking Questions:
Questions for Reflection:
In conclusion, the concept of cloning user skills presents a promising avenue for enhancing the capabilities and user experience of AI agents. By enabling AI agents to learn from and mimic user behavior, we can create a personalized and customized user experience, potentially leading to increased user satisfaction and engagement. However, this process must be handled with care to ensure that it is effective, accurate, and respectful of user privacy and ethical guidelines. With careful implementation and continuous refinement, the cloning of user skills can be effectively achieved to unlock the full potential of AI agents.
I'm looking forward to your feedback and the ideas that I have suggested with my comments here.
NEVER STOP HACKING! :sunglasses: :computer: :fire:
Hi, @solarapparition I am interested in your idea! What I am worried is that most tasks in voyager is decomposable . For example, In Minecraft, subtask mining wood log skill could be a part of mining diamond. If we expect the agent to maintaining permanent library, I am not sure what kind of scenarios/tasks would benefit from this permanent library and find appropriate skills to use. Could you give a specific example here?
So I think some task environments are more conducive for allowing an agent to (de)compose tasks. To me there are a couple of necessary (though likely not sufficient) conditions:
T
needs subtasks t1
, t2
, and t3
in that order, it doesn't need to account for huge changes in the environment by the time it gets to t3
. I think this is particularly important for complex tasks that need many layers of decomposition. For Minecraft, the world does change, but only gradually and not meaningfully in the time horizon that Voyager's tasks are executed at.Many other environments don't satisfy these requirements, but there should be at least some useful ones that do. One that I'm particularly interested in is autonomously interacting with a website through browser automation such as Selenium. For condition 1, on most webpages there is a fairly limited number of things a user needs to know how to do (clicking, scrolling, typing in text, etc.), and for condition 2, you don't expect the global structure of websites to change rapidly. I can imagine that once you teach an agent a few basic workflows on a website, it'll be able to combine those workflows for more complex tasks.
With all that said, I don't know if this recursive teachable agent is really a separate thing from TeachableAgent
itself—it should actually be pretty straightforward to just configure a TeachableAgent
with a mapped function that wraps another instance of itself. The main challenge is probably dealing with context decay as the agent goes deeper into the recursion, but I want to try it out for an autonomous browsing agent I'm working on.
I don't know if this recursive teachable agent is really a separate thing from
TeachableAgent
itself
Yes, that's a key question. We've just created new issues #538 and #540 which will hopefully address most of the fundamental concerns raised above. But that will take some months. Meanwhile, does anyone volunteer to write a partial, shorter-term solution that would allow python functions to be stored and retrieved from a skill library implemented as a vector DB?
For the short-term solution, isn't that just a matter of configuring a Retrieve agent with a custom text splitter that chunks Python files into function blocks? Then you could just save all the functions into a file that you point the retriever to, and build off of that. That seems like something that could be done without having to modify Autogen itself, except maybe updating TEXT_FORMATS
in retrieve_utils.py
, but even that's not strictly necessary.
For the short-term solution, isn't that just a matter of configuring a Retrieve agent with a custom text splitter that chunks Python files into function blocks? Then you could just save all the functions into a file that you point the retriever to, and build off of that. That seems like something that could be done without having to modify Autogen itself, except maybe updating
TEXT_FORMATS
inretrieve_utils.py
, but even that's not strictly necessary.
Yes, that's a reasonable approach to try and test.
So regarding this, #538 and #540, I've been mulling over the discussions at OpenAI_Agent_Swarm as well as my own earlier experiment, and I'm more and more convinced that the solution to these challenges isn't a single Autogen agent but a composite one consisting of groups of smaller agents—though, obviously that composite agent could be overlaid with the usual Autogen interface so that it still behaves like a single agent on the surface. It feels like the problem of task decomposition is complex enough that it requires multiple mini-agent groups collaborating in a dynamic workflow, with agent groups being able to handle decisions at various levels of granularity.
I'm experimenting with some approaches on that end and can post if something fruitful comes out of it.
Makes sense and sounds exciting. Looking forward to your experiments.
A new relevant paper: https://arxiv.org/abs/2311.07635
A new relevant paper: https://arxiv.org/abs/2311.07635
So this got me thinking.
(Forgive the wall of text here—I don't have anything concrete, just throwing out an idea and hoping it sparks some inspiration.)
This paper, LLMs as Optimizers, Eureka, and a few other APE papers make me feel that when it comes to problem solving, the biggest advantage that LLMs have over humans is that they can generate many potential improvements on an existing (faulty) solution and semi-brute-force their way to at least a locally optimal solution... at least for problems where there is a feedback metric that provides enough signal for further optimization attempts. I think in Eureka Jim Fan called it "gradient-free optimization", but I feel like it's something that has broader applicability than just the context he was referring to.
The application to code-based skills is pretty obvious (run an optimization loop until the code works to some standard), but actually I wonder if it can be used to optimize the transformation from the "current task" to "skill retrieval query for current task": say we have task T
that requires retrieving skills {s_1, s_2, ..., s_n}
from the skills vector database. Obviously pushing the description for T
directly in as query is shaky at best, especially if what you're querying is code. I think best practice in RAG right now is to do some sort of prompt-based transformation of T
into queries that are likely to retrieve relevant chunks for solving T
, say using a prompt P
. BUT—and someone correct me if I'm wrong—I think P
is usually created via manual prompt engineering, e.g. P = "write out questions that are needed to answer this question"
and the like. But what if one does some optimization on that prompt? We can start with a set of {T_j}
tasks that are representative of the class of tasks that an agent specializes in, and evaluating the retrieval results based on some pre-constructed skill library with known "optimal" retrieval results for the tasks, and in theory you'd end up with some sort of optimized P
for whatever class of tasks you're interested in. The usual pitfalls of optimization (overfitting, poor evaluation metric, etc.) are all there, but those feel like they're better understood than the dark arts of writing a really good retrieval query prompt, especially since the optimal prompt might differ dramatically based on the task type and available skill library.
@skzhang1 @JieyuZ2 FYI
one related work: https://arxiv.org/abs/2308.00304
Possibly also relevant: Chain of Code
@rickyloynd-microsoft moved to be a sub issue of user defined functions
I really like the concept of the "skill recipe" in AutoGen, but I think it can be taken much further. One of the key takeaways from Voyager was that it was possible to continually scale up an agent's capabilities by letting it access a permanent library of previously generated capabilities, along with the ability to compose them into new ones; that's particularly relevant in AutoGen, where the individual capabilities of agents in a group can have a multiplicative effect on the intelligence of the agent group as a whole, depending on the role of the agent.
As a first step, simply having a straightforward, programmatic way to automate the creation, saving, and loading of recipes would go a long way towards developing a full "skill library" feature down the line.