RAG example from blog w/ open-source models outputs 'TERMINATE' or the reference document

matsuobasho commented 8 months ago

Describe the issue

I'm running the first example from the RAG tutorial here, using open-source models via LMStudio.

Here is what I get when I run with Mistral Insruct v0 7 B quantized:

Trying to create collection.
Number of requested results 20 is greater than number of elements in index 2, updating n_results = 2
doc_ids:  [['doc_1', 'doc_0']]
Adding doc_id doc_1 to context.
Adding doc_id doc_0 to context.
ragproxyagent (to assistant):

You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
You must give as short an answer as possible.

User's question is: What is autogen?

Context is: 
## Contributors Wall
<a href="https://github.com/microsoft/autogen/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=microsoft/autogen" />
</a>

# Legal Notices

Microsoft and any contributors grant you a license to the Microsoft documentation and other content
in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
[LICENSE-CODE](LICENSE-CODE) file.
...

TERMINATE

Llama2chat 7B quantized seems to just output the document itself:

Trying to create collection.
Number of requested results 20 is greater than number of elements in index 2, updating n_results = 2
doc_ids:  [['doc_1', 'doc_0']]
Adding doc_id doc_1 to context.
Adding doc_id doc_0 to context.
ragproxyagent (to assistant):

You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
You must give as short an answer as possible.

User's question is: What is autogen?

Context is: 
## Contributors Wall
<a href="https://github.com/microsoft/autogen/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=microsoft/autogen" />
</a>

# Legal Notices

Microsoft and any contributors grant you a license to the Microsoft documentation and other content
in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
[LICENSE-CODE](LICENSE-CODE) file.
...
[Your Name]
AutoGen Contributor Team

I know that different LMs will perform differently with RAG, but would like to understand whether there's anything to tune or other open-source models to try to get this to work.

Steps to reproduce

Install LMStudio
Download TheBloke llama2 chat 7B Q2_K gguf and Mistral 7B Instruct v0.1 gguf.
Select one of the models and run server in LM Studio

Execute the below in Python:


import autogen
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent

config_list = [ { "base_url": "http://localhost:1234/v1", "api_key": "NULL" } ]

llm_config = { "timeout": 600, "seed": 20, "config_list": config_list, "temperature": 0 }

assistant = RetrieveAssistantAgent( name="assistant", system_message="You are a helpful assistant.", llm_config=llm_config, )

ragproxyagent = RetrieveUserProxyAgent( name="ragproxyagent", human_input_mode="NEVER", retrieve_config={ "task": "qa", "docs_path": "https://raw.githubusercontent.com/microsoft/autogen/main/README.md", }, )

assistant.reset() ragproxyagent.initiate_chat(assistant, problem="What is autogen?")



### Screenshots and logs

_No response_

### Additional Information

pyautogen 0.2.2
Windows 10
Python 3.11.7

rickyloynd-microsoft commented 8 months ago

@thinkall fyi

tidymonkey81 commented 8 months ago

i modified the default to the following (with good results for mixtral). It just reminds it to not end with TEMRINATE unless there is a useful output. I guess this isn't necessary for gpt-4:

"You are a helpful assistant that can use available functions when needed to solve problems. At each point, do your best to determine if the user's request has been addressed. The user will run the code and respond with any error messages, or a code complete status. IF THE REQUEST HAS NOT BEEN ADDRESSED, RESPOND WITH CODE TO ADDRESS IT. IF A FAILURE OCCURRED (e.g., due to a missing library) AND SOME ADDITIONAL CODE WAS WRITTEN (e.g. code to install the library), ENSURE THAT THE ORIGINAL CODE TO ADDRESS THE TASK STILL GETS EXECUTED. If the request HAS been addressed, respond with a summary of the result. The summary must be written as a coherent helpful response to the user request e.g. 'Sure, here is result to your request ' or 'The tallest mountain in Africa is ..' etc. The summary MUST end with the word TERMINATE. Only end with the word TERMINATE if the users request has been addressed. If the user request is pleasantry or greeting, you should respond with a pleasantry or greeting and TERMINATE. DO NOT INCLUDE THE WORD TERMINATE UNLESS THE USER'S REQUEST HAS BEEN ADDRESSED."

sonichi commented 8 months ago

Interesting! Would you like to write a blogpost about your findings?

matsuobasho commented 8 months ago

@tidymonkey81 thanks for this detailed prompt. However, it doesn't seem to make a difference - tried it with Llama 13B and Mistral 7B and get the same results.

tidymonkey81 commented 8 months ago

@matsuobasho I've added some coaxing statements to the default assistant. I can only verify it working with mixtral. specifically, TheBloke_dolphin-2.7-mixtral-8x7b-GPTQ, as this is what I was tinkering with last night. I just happened to be trying out the new autogen studio interface also (very nice it is too!). I was able to run some basic workflows. It was able to generate and run simple code, fixing its errors and installing packages along the way. I think there were some issues with old package training data that hindered progress on some workflows (it couldn't get past an issue with yfinance for example). I'll also need to revisit what it was doing with skills as I couldn't see how the agents were calling them through the UI or console (it only wrote fresh code, but that could be an error in my usage). I did try the same thing previously in notebook setups using mistral 7B without success (I only used the same prompt coaxing technique back then too). Mixtral was more successful and I feel there's potential for improvement with that model for me at least (I spent a couple hours).

@sonichi I don't have a platform for that but I'd be happy to run through again and document what I was doing and post it here. Thanks for adding me!

matsuobasho commented 8 months ago

@tidymonkey81 I'm not seeing the model you reference in LM Studio:

Also, what results do you get with that QA task about autogen on that model?

matsuobasho commented 8 months ago

From the Autogen paper, it seems the example I provided should be done with ease. So the fact that it's unable to perform using Llama2 is underwhelming. Look forward to comments / feedback / potential solutions.

I can try to investigate on my own and report what I find if more experienced Autogen users can provide some pointers on where to start.

matsuobasho commented 7 months ago

Any recommendations on this? For all the attention it's getting, I've given up with Autogen experimentation for the time being because these results don't instill too much confidence in it.

ChristianWeyer commented 7 months ago

Any ideas @sonichi (or @tidymonkey81) how to help @matsuobasho ?

sonichi commented 7 months ago

@matsuobasho I only see the beginning of the chat, which looks normal to me. Could you share the part of the conversation that looks wrong?

matsuobasho commented 7 months ago

This is the second example from what I shared originally. It appears to just reproduce the document without any additional output.

doc_ids:  [['doc_1', 'doc_0']]
[32mAdding doc_id doc_1 to context.[0m
[32mAdding doc_id doc_0 to context.[0m
[33mragproxyagent[0m (to assistant):

You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
You must give as short an answer as possible.

User's question is: What is autogen?

Context is: 
## Contributors Wall
<a href="https://github.com/microsoft/autogen/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=microsoft/autogen" />
</a>

# Legal Notices

Microsoft and any contributors grant you a license to the Microsoft documentation and other content
in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
[LICENSE-CODE](LICENSE-CODE) file.

Microsoft, Windows, Microsoft Azure, and/or other Microsoft products and services referenced in the documentation
may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries.
The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks.
Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.

Privacy information can be found at https://privacy.microsoft.com/en-us/

Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents,
or trademarks, whether by implication, estoppel, or otherwise.

[![PyPI version](https://badge.fury.io/py/pyautogen.svg)](https://badge.fury.io/py/pyautogen)
[![Build](https://github.com/microsoft/autogen/actions/workflows/python-package.yml/badge.svg)](https://github.com/microsoft/autogen/actions/workflows/python-package.yml)
![Python Version](https://img.shields.io/badge/3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue)
[![Downloads](https://static.pepy.tech/badge/pyautogen/week)](https://pepy.tech/project/pyautogen)
[![](https://img.shields.io/discord/1153072414184452236?logo=discord&style=flat)](https://discord.gg/pAbnFJrkgZ)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Follow%20%40pyautogen)](https://twitter.com/pyautogen)

# AutoGen

<!-- <p align="center">
    <img src="https://github.com/microsoft/autogen/blob/main/website/static/img/flaml.svg"  width=200>
    <br>
</p> -->
:fire: Nov 24: pyautogen [v0.2](https://github.com/microsoft/autogen/releases/tag/v0.2.0) is released with many updates and new features compared to v0.1.1. It switches to using openai-python v1. Please read the [migration guide](https://microsoft.github.io/autogen/docs/Installation#python).

:fire: Nov 11: OpenAI's Assistants are available in AutoGen and interoperatable with other AutoGen agents! Checkout our [blogpost](https://microsoft.github.io/autogen/blog/2023/11/13/OAI-assistants) for details and examples.

:fire: Nov 8: AutoGen is selected into [Open100: Top 100 Open Source achievements](https://www.benchcouncil.org/evaluation/opencs/annual.html) 35 days after spinoff.

:fire: Nov 6: AutoGen is mentioned by Satya Nadella in a [fireside chat](https://youtu.be/0pLBvgYtv6U) around 13:20.

:fire: Nov 1: AutoGen is the top trending repo on GitHub in October 2023.

:tada: Oct 03: AutoGen spins off from [FLAML](https://github.com/microsoft/FLAML) on Github and has a major paper update.

:tada: Aug 16: Paper about AutoGen on [arxiv](https://arxiv.org/abs/2308.08155). [📚 Cite paper](#related-papers).

:tada: Mar 29: AutoGen is first created in [FLAML](https://github.com/microsoft/FLAML/pull/968).

<!--
:fire: FLAML is highlighted in OpenAI's [cookbook](https://github.com/openai/openai-cookbook#related-resources-from-around-the-web).

:fire: [autogen](https://microsoft.github.io/autogen/) is released with support for ChatGPT and GPT-4, based on [Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference](https://arxiv.org/abs/2303.04673).

:fire: FLAML supports Code-First AutoML & Tuning – Private Preview in [Microsoft Fabric Data Science](https://learn.microsoft.com/en-us/fabric/data-science/). -->

## What is AutoGen

AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

![AutoGen Overview](https://github.com/microsoft/autogen/blob/main/website/static/img/autogen_agentchat.png)

- AutoGen enables building next-gen LLM applications based on [multi-agent conversations](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat) with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
- It supports [diverse conversation patterns](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat#supporting-diverse-conversation-patterns) for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy,
  the number of agents, and agent conversation topology.
- It provides a collection of working systems with different complexities. These systems span a [wide range of applications](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat#diverse-applications-implemented-with-autogen) from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
- AutoGen provides [enhanced LLM inference](https://microsoft.github.io/autogen/docs/Use-Cases/enhanced_inference#api-unification). It offers utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.

AutoGen is powered by collaborative [research studies](https://microsoft.github.io/autogen/docs/Research) from Microsoft, Penn State University, and the University of Washington.

## Quickstart
The easiest way to start playing is
1. Click below to use the GitHub Codespace

    [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/autogen?quickstart=1)

 2. Copy OAI_CONFIG_LIST_sample to ./notebook folder, name to OAI_CONFIG_LIST, and set the correct configuration.
 3. Start playing with the notebooks!

## Using existing docker image
Install docker, save your oai key into an environment variable name OPENAI_API_KEY, and then run the following.

docker pull yuandongtian/autogen:latest docker run -it -e OPENAI_API_KEY=$OPENAI_API_KEY -p 8081:8081 docker.io/yuandongtian/autogen:latest


Then open `http://localhost:8081/` in your browser to use AutoGen. The UI is from `./samples/apps/autogen-assistant`. See docker hub [link](https://hub.docker.com/r/yuandongtian/autogen) for more details.

## Installation

AutoGen requires **Python version >= 3.8, < 3.12**. It can be installed from pip:

```bash
pip install pyautogen

Minimal dependencies are installed without extra options. You can install extra options based on the feature you need.

Find more options in Installation.

For code execution, we strongly recommend installing the Python docker package and using docker.

For LLM inference configurations, check the FAQs.

Multi-Agent Conversation Framework

Autogen enables the next-gen LLM applications with a generic multi-agent conversation framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans. By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.

Features of this use case include:

Multi-agent conversations: AutoGen agents can communicate with each other to solve tasks. This allows for more complex and sophisticated applications than would be possible with a single LLM.
Customization: AutoGen agents can be customized to meet the specific needs of an application. This includes the ability to choose the LLMs to use, the types of human input to allow, and the tools to employ.
Human participation: AutoGen seamlessly allows human participation. This means that humans can provide input and feedback to the agents as needed.

For example,

from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
# Load LLM inference endpoints from an env variable or a file
# See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints
# and OAI_CONFIG_LIST_sample
config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")
# You can also set config_list directly as a list, for example, config_list = [{'model': 'gpt-4', 'api_key': '<your OpenAI API key here>'},]
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"})
user_proxy.initiate_chat(assistant, message="Plot a chart of NVDA and TESLA stock price change YTD.")
# This initiates an automated chat between the two agents to solve the task

This example can be run with

python test/twoagent.py

After the repo is cloned. The figure below shows an example conversation flow with AutoGen. Agent Chat Example

Alternatively, the sample code here allows a user to chat with an AutoGen agent in ChatGPT style. Please find more code examples for this feature.

Enhanced LLM Inferences

Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers enhanced LLM inference with powerful functionalities like caching, error handling, multi-config inference and templating.

Documentation

You can find detailed documentation about AutoGen here.

In addition, you can find:

Research, blogposts around AutoGen, and Transparency FAQs
Discord
Contributing guide
Roadmap

Related Papers

AutoGen

@inproceedings{wu2023autogen,
      title={AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework},
      author={Qingyun Wu and Gagan Bansal and Jieyu Zhang and Yiran Wu and Beibin Li and Erkang Zhu and Li Jiang and Xiaoyun Zhang and Shaokun Zhang and Jiale Liu and Ahmed Hassan Awadallah and Ryen W White and Doug Burger and Chi Wang},
      year={2023},
      eprint={2308.08155},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

EcoOptiGen

@inproceedings{wang2023EcoOptiGen,
    title={Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference},
    author={Chi Wang and Susan Xueqing Liu and Ahmed H. Awadallah},
    year={2023},
    booktitle={AutoML'23},
}

MathChat

@inproceedings{wu2023empirical,
    title={An Empirical Study on Challenging Math Problem Solving with GPT-4},
    author={Yiran Wu and Feiran Jia and Shaokun Zhang and Hangyu Li and Erkang Zhu and Yue Wang and Yin Tat Lee and Richard Peng and Qingyun Wu and Chi Wang},
    year={2023},
    booktitle={ArXiv preprint arXiv:2306.01337},
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

If you are new to GitHub here is a detailed help source on getting involved with development on GitHub.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information, see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

[33massistant[0m (to ragproxyagent):

Thank you for considering contributing to AutoGen! We're excited to have you on board and look forward to your contributions. Before we can accept your pull request, we need to ensure that you agree to our Contributor License Agreement (CLA). This is a legal requirement for all contributors to Microsoft-owned open source projects.

To proceed, please follow these steps:

Read and sign the CLA: Please review the CLA at https://cla.opensource.microsoft.com/ and sign it electronically. This will ensure that you grant us the necessary rights to use your contribution.
Provide a pull request: Once you have signed the CLA, please submit a pull request with your changes. The CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment).
Wait for approval: Our team will review your pull request and respond with any questions or comments. Once we have approved your changes, they will be merged into the codebase.

Thank you again for your contributions! We're excited to work together to make AutoGen an even better tool for the community.

sonichi commented 7 months ago

Here is how retrieve chat works: The message from raguserproxyagent to assistant is the retrieved document. The message from assistant to ragproxyagent is the answer based on the retrieved doc.

If you look at the conversation history, there are two messages: raguserproxyagent to assistant, and assistant to ragproxyagent. The second message starts with "Thank you for considering contributing to AutoGen!..."

matsuobasho commented 7 months ago

Not following - where is the answer in response to my question 'What is autogen?' I don't see that anywhere, just a rehash of the document.

In the tutorial using OpenAI, we see a legitimate answer:

--------------------------------------------------------------------------------
assistant (to ragproxyagent):

AutoGen is a framework that enables the development of large language model (LLM) applications using multiple agents that can converse with each other to solve tasks. The agents are customizable, conversable, and allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

--------------------------------------------------------------------------------

sonichi commented 7 months ago

What I see is:

�[33massistant�[0m (to ragproxyagent):

Thank you for considering contributing to AutoGen! We're excited to have you on board and look forward to your contributions. Before we can accept your pull request, we need to ensure that you agree to our Contributor License Agreement (CLA). This is a legal requirement for all contributors to Microsoft-owned open source projects.

To proceed, please follow these steps:

Read and sign the CLA: Please review the CLA at https://cla.opensource.microsoft.com/ and sign it electronically. This will ensure that you grant us the necessary rights to use your contribution.
Provide a pull request: Once you have signed the CLA, please submit a pull request with your changes. The CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment).
Wait for approval: Our team will review your pull request and respond with any questions or comments. Once we have approved your changes, they will be merged into the codebase.
Thank you again for your contributions! We're excited to work together to make AutoGen an even better tool for the community.

This is produced by the model you are using given the retrieved doc.

matsuobasho commented 7 months ago

Ok, got it. So the takeaway is that at this stage of ML development, this is completely useless for simple RAG questions when using the open-source models I've tried (Mistral, Llama2-quantized). I know that's not Autogen's fault per-se but just pointing this out.

Let me know if I'm being too critical or pessimistic or missing something. Obviously I haven't tried with all open-source models, so there's always the possibility that it will work fine with others. Additionally, I suppose experimenting with different prompts may also help.

sonichi commented 7 months ago

I agree that "experimenting with different prompts may also help". The performance of agents is affected by the prompt + model combination. For example, the result from your example suggests that the current combination makes the agent forget about the initial question.

matsuobasho commented 7 months ago

Thanks @sonichi , makes sense. Before I close, I'm going to try to reproduce the 'Terminate' message I referenced originally.

microsoft / autogen