microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
32.32k stars 4.71k forks source link

Integrate opensource LLMs into autogen #46

Closed LeoLjl closed 6 months ago

LeoLjl commented 1 year ago

Currently, autogen.oai only supports OpenAI models. I am planning to integrate open source LLMs into autogen. This would require Hugging Face transformers library.

Some of the open source LLMs I have in mind include LLaMa, Alpaca, Falcon, Vicuna, WizardLM, Starcoder, Guanaco. Most of their inference codes are integrated into Hugging Face transformers library, which can be unified into something like this.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "<name-of-model>"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
inputs = tokenizer("The world is", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))
### Tasks
- [ ] https://github.com/microsoft/FLAML/issues/1017
- [ ] https://github.com/microsoft/FLAML/issues/1037
- [ ] https://github.com/microsoft/autogen/issues/15
LeoLjl commented 1 year ago

The current goal is to support open-source LLMs in LLM-based agents with as little modification as possible.

The core code that needs modification is L41 in autogen/agent/assistant_agent.py. It only supports making completions using OpenAI models.

        responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config)

There are two ways to do so. One is to create a utility function that checks the model name and call the regarding inference APIs.

openai_models = ["gpt-4", "gpt-3"]
open_source_models = ["llama", "alpaca"]
google_models = ["bard"]

def create(model, **kwargs):
    if model in openai_models:
        # generate response using  "oai.ChatCompletion.create"
        pass
    elif model in open_source_models:
        # generate response using related Hugging Face APIs
        pass
    elif model in google_models:
        # generate response using related Google APIs
        pass

This is quick to implement and easy to use. However it may introduce some unsolicited problems when developing other applications.

Another is to inherit the autogen.oai.Completion class and rewrite the methods.

class OpensourceCompletion(autogen.oai.Completion):
    @classmethod
    def create(**kwargs):
        # Make completion using related APIs
        pass

This method is not consistent with folder name oai.

I am researching what other libraries are handling with this situation and find potential solutions, e.g. LangChain, JARVIS.

sonichi commented 1 year ago

The naming consistency is not an issue because we can interpret it. For example, oai offers an inference API compatible with openai. It doesn't prevent supporting non-openai models using the same inference API.

sonichi commented 1 year ago

Related issue: microsoft/FLAML#1037 microsoft/autogen#15 microsoft/FLAML#1017

sonichi commented 1 year ago

related: FastChat/docs/openai_api.md at main · lm-sys/FastChat · GitHub

LeoLjl commented 1 year ago

FastChat is a useful tool as a local drop-in replacement for OpenAI APIs. This greatly simplifies integration. ToDo:

sonichi commented 1 year ago

Thanks for creating a task list. Could you edit the "tasks" in the main body of the issue? There you can track sub issues. For example, the two items you listed here correspond to two existing issues microsoft/FLAML#1017 and microsoft/FLAML#1037. You can comment on those existing issues about the approach you'll take to solve them. Also add microsoft/autogen#15 to the task list.

ishaan-jaff commented 1 year ago

Hi @sonichi @LeoLjl I’m the maintainer of LiteLLM (abstraction to call 100+ LLMs)- we allow you to create a proxy server to call 100+ LLMs, and I think it can solve your problem (I'd love your feedback if it does not)

Try it here: https://docs.litellm.ai/docs/proxy_server https://github.com/BerriAI/litellm

Using LiteLLM Proxy Server

import openai
openai.api_base = "http://0.0.0.0:8000/" # proxy url
print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))

Creating a proxy server

Ollama models

$ litellm --model ollama/llama2 --api_base http://localhost:11434

Hugging Face Models

$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1

Anthropic

$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1

Palm

$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison
yhyu13 commented 1 year ago

@LeoLjl There are projects like textgen webui that allows openai api adapter to local llms. And there are proejcts like one-api that servers as middleware for a lot of different remote llm backends

All of them simply redirect the openai api without introducing new code or modifying code

LeoLjl commented 1 year ago

Hi @yhyu13, thanks for the valuable information. This is really helpful. I will include it in the next docs update!

LeoLjl commented 1 year ago

Hi @ishaan-jaff, liteLLM looks like just the solution we need. I will try it out and include it in the new docs update!

YaswanthDasamandam commented 1 year ago

Hi , can someone answer my question #89 . I am overriding the create class to generate the answers .

ishaan-jaff commented 1 year ago

@TheCompAce I can help you on our discord: https://discord.com/invite/wuPM9dRgDw

TheCompAce commented 1 year ago

@TheCompAce I can help you on our discord: https://discord.com/invite/wuPM9dRgDw

I was a idiot to fix it I had to set the api route to "/chat/completions" to match openAIs, so I used `from flask import Flask, request, jsonify from flask_swagger_ui import get_swaggerui_blueprint from litellm import completion

app = Flask(name)

Swagger UI setup

SWAGGER_URL = '/swagger' API_URL = '/static/swagger.json' swaggerui_blueprint = get_swaggerui_blueprint( SWAGGER_URL, API_URL, config={ 'app_name': "Litellm Server" } ) app.register_blueprint(swaggerui_blueprint, url_prefix=SWAGGER_URL)

@app.route('/chat/completions', methods=['POST']) def complete(): data = request.json model = "huggingface/mistralai/Mistral-7B-Instruct-v0.1" messages = data.get('messages', [])

response = completion(model=model, messages=messages, max_tokens=8192)
return jsonify(response)

if name == 'main': app.run(debug=True) And now I can use the config for AutoGen[ { "model": "custom_liteLLM", "api_base": "http://127.0.0.1:5000" } ]`

ishaan-jaff commented 1 year ago

made a PR to autogen: https://github.com/microsoft/autogen/pull/95 @LeoLjl

henrywithu commented 1 year ago

This looks quite promising. Do you have any example on it? TBH GPT-4 token is pricey, and Code Llama can be a nice choice

TheCompAce commented 1 year ago

I have code to get it working above, but seemed to work at first, then started getting token of the limit errors. So maybe I can work on it some more another time.

fxtoofaan commented 1 year ago

can you please see if this api integration will work with TheBloke models if possible like this one TheBloke/Mistral-7B-OpenOrca-GPTQ

https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GPTQ

maybe FastChat+vLLM server in the backend with openai api serving TheBloke/Mistral-7B-OpenOrca-GPTQ model any prompt adjustments if needed like chatml etc.

and example of python code to show how to connect to this using autogen in python script.

LeoLjl commented 1 year ago

Hi @fxtoofaan, I think this will solve your question. https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs and https://github.com/microsoft/autogen/blob/osllm/notebook/open_source_language_model_example.ipynb

fxtoofaan commented 1 year ago

@LeoLjl thank you. can you look into these various prompt tempalates: https://github.com/oobabooga/text-generation-webui/tree/main/instruction-templates

Because of the OpenOrca models I am interested in this prompt template: https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/MPT-Chat.yaml

can this type or prompt template be used in autogen? example of this in python script would be useful. thank you,

by the way, the model I am playing around with now is this: https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ

bigsk1 commented 1 year ago

@LeoLjl thank you. can you look into these various prompt tempalates: https://github.com/oobabooga/text-generation-webui/tree/main/instruction-templates

Because of the OpenOrca models I am interested in this prompt template: https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/MPT-Chat.yaml

can this type or prompt template be used in autogen? example of this in python script would be useful. thank you,

by the way, the model I am playing around with now is this: https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ

I was able to get oobabooga openai api working last night but was getting token limit errors 2048 and still need to troubleshoot. Was running text gen webui locally and running autogen locally on wsl2. Running oobaba with openai api flag and in a notebook using

import autogen

config_list = [
    {
        "model": "gpt-4",
        "api_key": "sk-111111111111111111111111111111111111111111111111",
        "api_base": "http://192.168.XX.XX:5001/v1/chat/completions",
    },
]

using model TheBloke_Llama-2-13B-chat-GPTQ.

Was working just fine and worked just like a open ai api call using gpt4

apoorv28goel commented 1 year ago

@LeoLjl thank you. can you look into these various prompt tempalates: https://github.com/oobabooga/text-generation-webui/tree/main/instruction-templates Because of the OpenOrca models I am interested in this prompt template: https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/MPT-Chat.yaml can this type or prompt template be used in autogen? example of this in python script would be useful. thank you, by the way, the model I am playing around with now is this: https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ

I was able to get oobabooga openai api working last night but was getting token limit errors 2048 and still need to troubleshoot. Was running text gen webui locally and running autogen locally on wsl2. Running oobaba with openai api flag and in a notebook using

import autogen

config_list = [
    {
        "model": "gpt-4",
        "api_key": "sk-111111111111111111111111111111111111111111111111",
        "api_base": "http://192.168.XX.XX:5001/v1/chat/completions",
    },
]

using model TheBloke_Llama-2-13B-chat-GPTQ.

Was working just fine and worked just like a open ai api call using gpt4

How is the performance ? can you share some examples ?

bigsk1 commented 1 year ago

How is the performance ? can you share some examples ?

good, I'm on a 4090, below is a example to check if working

import requests
import json

# Define the API endpoint and headers
api_base = "http://192.168.1.68:5001/v1"
api_endpoint = f"{api_base}/chat/completions"
headers = {
    "Authorization": "Bearer sk-111111111111111111111111111111111111111111111111", # Dummy key, replace if your oobabooga instance requires a specific key
    "Content-Type": "application/json"
}

# Define the payload (data to send)
payload = {
    "model": "TheBloke_EverythingLM-13B-16K-GPTQ",  # Replace with your specific model name if different
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2021?"},
        {"role": "assistant", "content": "The Atlanta Braves won the World Series in 2021."},
        {"role": "user", "content": "Where was it played?"}
    ]
}

try:
    # Make the API call
    response = requests.post(api_endpoint, headers=headers, json=payload, timeout=20)

    # Check the response
    if response.status_code == 200:
        print("API call successful.")
        print("Response:", json.dumps(response.json(), indent=4))
    else:
        print(f"API call failed. Status code: {response.status_code}")
        print("Error message:", response.text)

except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
API call successful.
Response: {
    "id": "chatcmpl-1696808596275027712",
    "object": "chat.completions",
    "created": 1696808596,
    "model": "TheBloke_EverythingLM-13B-16K-GPTQ",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "It was played at various locations throughout the United States, including Houston, Texas.\n"
            }
        }
    ],
    "usage": {
        "prompt_tokens": 91,
        "completion_tokens": 18,
        "total_tokens": 109
    }
}

in cli you see Output generated in 4.02 seconds (4.73 tokens/s, 19 tokens, context 91, seed 1022663743) 192.168.1.68 - - [08/Oct/2023 16:43:20] "POST /v1/chat/completions HTTP/1.1" 200 -

start oobab with flags --listen --extensions openai

Also here is a sample notebook were a local llm was used https://github.com/bigsk1/autogen/blob/main/notebook/agentchat_stream.ipynb

in the root directory I have OAI_CONFIG_LIST and inside

{
        "model": "gpt-4",
        "api_key": "sk-111111111111111111111111111111111111111111111111",
        "api_base": "http://192.168.1.68:5001/v1/chat/completions"
    },
TheAnachronism commented 1 year ago

When using OpenAI's models, one can easily provide a weblink to a paper, for example, and the models can access that. I guess that is handled on OpenAI's side, as trying this out with local LLMs, this completely fails. Is there a version of this for local LLMs, or does this have to be built yet?

sonichi commented 1 year ago

For new web link that's not in the training data of the OpenAI model, the agent needs to write code to visit the link. Check this example: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_web_info.ipynb

babycommando commented 1 year ago

wrote a medium article for doing this with oobabooga.

https://babycmd.medium.com/local-llms-and-autogen-an-uprising-of-local-powered-agents-d472f2c3d0e3

bomsn commented 11 months ago

I was able to make it work with Google's Vertex AI ( bard, chat-bison..etc ) by creating a FastAPI endpoint that matches OpenAI /chat/completions/ endpoint and takes the same data:

chat = APIRouter(prefix="/vertexai/chat", tags=['Vertex AI Chat'])

@chat.post("/completions/", response_model=ResponseData, summary="Send a chat message to Vertex AI LLM and get a response.")
async def chat_completions_endpoint(request: RequestData, api_key: str = Depends(get_api_key)):
    """
    This function `chat_completions_endpoint` is an API endpoint that accepts POST requests at "/chat/completions/".
    It takes in a JSON request body and an API key from the Authorization header. The JSON request body should contain the following fields:
    - `model`: The name of the Vertex AI model to use for generating chat completions ( only "codechat-bison" and "chat-bison" are supported ).
    - `credentials`: The service account credentials for accessing Vertex AI.
    - `location`: The location of the Vertex AI service (default is 'us-central1').
    - `messages`: A list of chat messages. Each message is a dictionary with 'role' and 'content' fields.
    - `temperature`: The randomness of the generated responses (default is 0).
    - `max_tokens`: The maximum number of tokens in the generated response (default is 2048).

    The function sends a chat message to Vertex AI using the specified model and returns a JSON response containing the generated chat completion.
    The response includes the following fields:
    - `id`: A random ID for the chat completion.
    - `object`: The type of the object, which is "chat.completion".
    - `created`: The timestamp when the chat completion was created.
    - `model`: The name of the Vertex AI model used.
    - `choices`: A list containing the generated chat completion. Each choice is a dictionary with 'index', 'message', and 'finish_reason' fields.
    - `usage`: A dictionary containing the number of tokens used in the prompt, completion, and total.

    If the API key is invalid, the function raises an HTTP 403 error. If any other error occurs during request processing, it raises an HTTP 500 error.
    """
    return 'Match Open AI /chat/completions/ response'

In my config file, I just added something like this:

    {
        "model": "chat-bison",
        "api_key": "my_api_key",
        "api_base": "http://website-or-localhost/vertexai",
        "location": "us-central1",
        "credentials": "Google cloud service account credentials here"
    }

Worked like charm.

sonichi commented 11 months ago

Can folks interested in this issue review #831 ? Many thanks.

MilutinTokic commented 9 months ago

You guys here are craizy. I tought that I am the only one in the world trying to defeat Open AI ;) I am currently trying to connect autogen to HugChat. I am trying to conect hugchat.cli to autogen so autogen an use it as Open AI API. Has anyone tried to achive that and if not why not ? That hugchat package doesent uses local LLMs instead it conects to really good space on HuggingFace and you can chose multiple models turn off/on internet access and many other. I can not find anything about using autogen with that hugchat.

bomsn commented 9 months ago

@MilutinTokic , you'll just have to be creative with it. Create an API in the middle that will replicate openAI response format, and use that to connect to any other LLM.

ekzhu commented 9 months ago

We are working on local model support: #1345

MilutinTokic commented 8 months ago

@bomsn I think that it can be achived with langchain. It can be used with autogen and it can be connected to huggingface

bomsn commented 8 months ago

@MilutinTokic yes, that's also an option, but I haven't tried it as I only wanted it to work for a single service without a lot of dependencies. But yeah, that could work.

MilutinTokic commented 8 months ago

@bomsn but that is only way to use AutoGen for free. That is my goal. I mean I am working on a big projects but I dont want to pay to open AI. What are you wonking on? How you use aoutgen? I am into figuring out how to use autogen for free for a few weeks and this is only that I come up with.

bomsn commented 8 months ago

@MilutinTokic, to be honest, you'll have to spend money to work with LLM one way or another. You either going to pay OpenAI, Microsoft, Google..etc, or rent a server to run some open source LLM, or run it locally on your computer which will require a decent GPU to get not so good results. If you're trying to get decent results, you'll have to either pay for the existing APIs or, as said, use a very large open source model which will cost you much more money than you'd pay for OpenAI, Microsoft or Google. However, if it's just for learning and it's nothing serious, you can use quantized open source models on hugging face below 7B params on your computer if you have at least 12GB vRam and still, it would be too slow.

MilutinTokic commented 8 months ago

@bomsn haha ok I know all that but I acived a lot of things for free so I will this, belive me. I am just starting I know it wouldnt be to fast and to reliable but I dont need it to be I have my methods. I never Paid for any digital/online product that I wanted and I got it eventualy. ;)

PyroGenesis commented 8 months ago

Personally, I use FastChat (as detailed in the AutoGen blog) and there are a few more alternatives mentioned in this thread already like LiteLLM, Langchain, Oobabooga's text-generation-webui. You can even roll your own endpoint with FastAPI as @bomsn mentioned.

This is until the local model support PR #1345 is merged which would be pretty helpful.

Josephrp commented 8 months ago

I've a stack available to test various models for various roles , most require custom code and parameters so i'll be using the custom model client to make tests and notebooks i guess :-)

sonichi commented 8 months ago

@PyroGenesis Thanks for the comment. #1345 has been merged. Looking forward to your further feedback. Thanks @Josephrp for your feedback too. cc @olgavrou

hghalebi commented 8 months ago

@amirmohammadshakeri

daoxuliu commented 6 months ago

I use Ollama, and adding corresponding 'model' and 'base_url' configurations in OAI_CONFIG_LIST enables local use of autogen. As follows:

json Copy code { "model": "mistral", "api_key": "ollama", "base_url": "http://localhost:11434/v1" } But there is an issue that I can't invoke the tool call no matter what. Does anyone know what the problem is?

ekzhu commented 6 months ago

Some models don't support tool calls. But have you tried user defined functions? https://microsoft.github.io/autogen/docs/topics/code-execution/user-defined-functions

daoxuliu commented 6 months ago

Some models don't support tool calls. But have you tried user defined functions? https://microsoft.github.io/autogen/docs/topics/code-execution/user-defined-functions

Thanks for the comments.It's seems like you are right.

ekzhu commented 6 months ago

We now support open source and open weight models: https://microsoft.github.io/autogen/docs/topics/non-openai-models/about-using-nonopenai-models