Write user code example

qingyun-wu commented 1 year ago

### Tasks
- [x] User code for the continual summarization task (Scenario 2)
- [ ] Make it clearer what are the learning settings, including learning_constraints, learning_objectives, learning_results, and learning_data in the continual summarization task.
- [ ] User code for Scenario 1: Use historical interaction log to improve an agent's performance with the accumulation of interaction sessions.
- [ ] To discuss is "learning" the precise word to use. Other alternatives include "digesting", "observing"?
- [ ] Questions to discuss: what are the objectives? For example, "improving user experience" could be a high-level objective. But what are good quantitative measures for it? What is a good measurement of "improvement". Potential metrics include: the number of turns to finish a task, the cost for finising a task.

qingyun-wu commented 1 year ago

The introduction of a LearnableAssistantAgent class (inherited from AssistantAgent) to serve as an assistant agent that could exhibit learning capabilities.

class LearnableAssistantAgent(AssistantAgent):

    def __init__(self, name, system_message=None, 
        learning_constraints=None,
        learning_objectives=None,
        learning_results=None,
        learning_data=None,
        ):
    """
    learning_constraints (dict): a dict of learning constraints  
        learning_trigger:
        (other resource):
    learning_objectives (Callable/str): objectives for learning
    learning_results (dict): learning results
    learning_data (list/DataFrame/Numpy): data to learn from 
    """ 
        super().__init__(
            name=name,
            system_message=system_message)
        self.setup_learning(
            learning_constraints=learning_constraints,
            learning_objectives=learning_objectives,
            learning_results=learning_results,
            learning_data=learning_data
        )

    # def _setup_learning_from_nlp(self, content):
    #     LLM(f"Extract xxx from {content}")

    def setup_learning(self,
        learning_constraints=None,
        learning_objectives=None,
        learning_results=None,
        learning_data=None
        ):
        """
        This function is used to set up learning-related parameters.
        """
        if learning_constraints is not None:
            self._learning_constraints = learning_constraints
        if learning_objectives is not None:
            self._learning_objectives = learning_objectives
        if learning_results is not None:
            self._learning_results = learning_results
        if learning_data is not None:
            self._learning_data = learning_data

    def _learn(self):
        """
        performs learning
        """
        # do something
        pass

    def receive(self, message, sender):
        # handle learning related set up
        self.setup_learning(
            enable_learning=message.get("enable_learning", None)
            learning_objectives=message.get("learning_objectives", None),
            constraints=message.get("constraints", None),
            learning_results=message.get("learning_results", None),
            conditions_to_enable_learn=message.get("conditions_to_enable_learn", None),
            data4learning=message.get("data4learning", None),
            conversation_log_as_data4learning=message.get("conversation_log_as_data4learning", None),
            clear_old_data=message.get("clear_old_data", None),
        )
        if self.learning_constraints.get("learning_trigger", None):
            self._learn()
            # check if learning is enabled
        # NOTE: but what if we want to learn async. Need to use multi-process/multi-worker
        # do something with the message
        # ...

Scenarios that need continual learning

Scenario 1. Use historical interaction log to improve the performance of an agent with the accumulation of interaction sessions.

from flaml.autogen.agent import LearnableAssistantAgent, UserProxyAgent
learning_setting = {
    "learning_constraints": learning_constraints,
    "learning_objectives": learning_objectives,
    "learning_results": learning_results,
    "learning_data": "log", # this means the user want to learn from the conversation log
}
assistant = LearnableAssistantAgent(name="assistant", **learning_setting)
user = UserProxyAgent(name="user")
assistant.receive(user.generate_init_prompt({"role": "user", "content": "Plot a rocket."}), user)
assistant.receive(user.generate_init_prompt({"role": "user", "content": "Plot a helicopter."}), user) # we expect the performance of the assistant to improve with the accumulation of experiences

# serialize the agents
pickle.dump(assistant, open("assistant.pkl", "wb"))
# load the agents
old_assistant = pickle.load(open("assistant.pkl", "rb"))

# here we would like to create a new assistant to conduct task.
# Motivations for creating a new AssistantAgent intead of just use
# the old agent: We would like to re-use certain part of the conversations from the old 
# assistant (with certain filter)
learning_setting.update({
    "learning_data": old_assistant.get_conversations(),
})
assistant = LearnableAssistantAgent(name="assistant", **learning_setting)
user = UserProxyAgent(name="user")
assistant.receive(user.generate_init_prompt({"role": "user", "content": "Plot a boat."}), user) # we expect this assistant could learn from the provided conversation

Scenario 2. Summarize data that don't fit in the context window at once.

from flaml.autogen.agent import LearningAgent, TeachingAgent
import feedparser
import pickle

cs_learner = LearningAgent(name="cs_learner")
teacher = TeachingAgent(name="teacher")
learning_setting = {
    "learning_constraints": {"learning_trigger": True, "cpu": 1},
    "learning_objectives": "Continuously summarize the latest research trends based on all the learning data given.",
}

cs_feed = feedparser.parse("http://export.arxiv.org/rss/cs")
# Loop through each entry (article)
for entry in cs_feed.entries:
    cs_data_tody = [entry.summary]
    learning_setting.update({"learning_data": cs_data_tody})
    cs_learner.receive(learning_setting, teacher) # the learner will try its best allowed capacity (under the learning constraints) toward the given objective. 
    learning_results = cs_learner.get_learning_results()
    print(f"This is the learning results so far: {learning_results}")

# serialize the agents
pickle.dump(assistant, open("cs_learner.pkl", "wb"))
# load the agents
cs_learner = pickle.load(open("cs_learner.pkl", "rb"))

stats_learner = LearningAgent(name="stats_learner")
learning_setting = {
    "learning_constraints": {"learning_trigger": True, "cpu": 1},
    "learning_objectives": "Continuously summarize the latest research trends based on all the learning data given.",
    "learning_results": cs_learner.get_learning_results()[-1],
    "learning_func": my_LLM_func,
}

stats_feed = feedparser.parse("http://export.arxiv.org/rss/stat")
for entry in stats_feed.entries:
    stats_data_today = [entry.summary]
    learning_setting.update{"learning_data": stats_data_today}
    stats_learner.receive(learning_setting, teacher)

weilinear commented 1 year ago

Quick question regarding the Scenario 2.

Will the learning_results template/structure being a hyper-parameter, or a learnable parameter? For example, I'm wondering if the learning results on research trends will change during the learning process - seeing a few samples will result in a very fine-grained trend (e.g. a particular method name) and the trend will be more abstract given more data (e.g. a particular field name).
Does the assistant need to know if only partial data (and what % of data) is given?
Is there any parameter to balance the assistant to learn from current batch of data and not forget about the previously given data?

Thanks! (Context, I was discussing with @sonichi on https://github.com/microsoft/FLAML/issues/1063 for our use case)

qingyun-wu commented 1 year ago

@weilinear, thank you for your interest and questions. Please find my responses to your questions below:

The learning results may change during the learning process. Generally speaking, the nature of learning_results is likely to be dependent on the specific learning algorithm employed. It could potentially be a hyper-parameter or a learnable parameter.
In many cases, it is impossible for the assistant to know if only partial data is given. For example, one can always choose to feed new data to the assistant whenever new data is available. Thus, I think we'd better design the assistant in a way such that it does not need to know if only partial data (and what % of data) is given.
One potential way to make such a balance: In the learning_constraints, we could consider including a learning_rate parameter.

weilinear commented 1 year ago

Thanks for the explanation. A few follow-up questions

What is considered a learning algorithm in this context?
How will learning_rate works here and what will be the assumption we could make about higher/lower learning rate? Will that be some measurements of how fast learning_results changes?

qingyun-wu commented 1 year ago

Thanks for the explanation. A few follow-up questions

What is considered a learning algorithm in this context?

How will learning_rate works here and what will be the assumption we could make about higher/lower learning rate? Will that be some measurements of how fast learning_results changes?

Hi @weilinear, the learning algorithm is a function that can take learning data and learning objective, and optionally previous learning results as inputs and outputs learning results. In the context of a summarization task, a large language model could be used as the learning function.

The learning rate mentioned in my previous response is just an example of the potential ways for the user to make balance old learning_results and learning from new data. How it works depends on the specific learning function. To avoid confusion, I have removed learning_rate from the code example and only keep a field named learning_func.

Thank you!

qingyun-wu commented 1 year ago

Hi @weilinear,

Thank you for your interest. Could you please take a look at this PR: https://github.com/microsoft/FLAML/pull/1098 , especially the continual summarization example code in this file: https://github.com/microsoft/FLAML/blob/6891db656de7b8de4ecbaad892abafee8411d3a8/test/autogen/test_continual_summarization.py

Thank you!

qingyun-wu commented 1 year ago

Hi @weilinear, An example use case of continual summarization is demonstrated briefly in this notebook: https://github.com/microsoft/FLAML/blob/67a23b167ecad631882fc9c781a1178dc4b5cf50/notebook/autogen_agent_continual_summarization.ipynb

Could you take a look at this demo and share your comments/suggestions? I'd like to have a chat with you if you have time.

weilinear commented 1 year ago

Thanks @qingyun-wu. Let me take a look at the demo and will leave some comments. I'm pretty much booked this week. Let's target to have a chat sometime late next week.

microsoft / FLAML