microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
35.18k stars 5.09k forks source link

[Bug]: Error running LLava Agent using LM Studio and Autogen #1234

Closed shaunaa126 closed 10 months ago

shaunaa126 commented 10 months ago

Describe the bug

I am trying to implement this notebook: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_lmm_llava.ipynb

The only difference is that I am hosting my BakLLava model using LM Studio and running it behind OpenAI API. When testing the llava_call I get the following error:

Error: Invalid reference to model version: http://0.0.0.0:8001/v1. Expected format: owner/name:version None

Steps to reproduce

  1. Install required packages
    pip3 install replicate
    pip3 install pillow
    pip3 install matplotlib
  2. Import packages and set LLAVA_MODE to "local"
    
    # More details in the two setup options below.
    import json
    import os
    import random
    import time
    from typing import Any, Callable, Dict, List, Optional, Tuple, Type, Union

import matplotlib.pyplot as plt import requests from PIL import Image from termcolor import colored

import autogen from autogen import Agent, AssistantAgent, ConversableAgent, UserProxyAgent from autogen.agentchat.contrib.llava_agent import LLaVAAgent, llava_call

LLAVA_MODE = "local" # Either "local" or "remote" assert LLAVA_MODE in ["local", "remote"]

3. Execute the `llava_call` test
```python
from autogen.agentchat.contrib.llava_agent import llava_call

config_list = [
    {
        "model": "llava",
        "api_key": "NULL",
        "base_url": "http://0.0.0.0:8001/v1",
    }
]

rst = llava_call("Describe this AutoGen framework <img https://raw.githubusercontent.com/microsoft/autogen/main/website/static/img/autogen_agentchat.png> with bullet points.",
    llm_config={
        "config_list": config_list,
        "temperature": 0
    }
)

print(rst)
  1. The following error comes up

Error: Invalid reference to model version: http://0.0.0.0:8001/v1. Expected format: owner/name:version None

Expected Behavior

It should be able to execute the llava_call successfully and respond with something like the following:

The AutoGen framework is a tool for creating and managing conversational agents. It allows for the creation of multiple-agent conversations, enabling complex interactions between different agents. The framework is designed to be flexible and scalable, allowing for the addition of new agents and conversations as needed.

The framework consists of three main components:

  1. Agents: These are the individual conversational entities that can be created and managed within the framework. Each agent has its own unique set of conversational capabilities and can engage in conversations with other agents.

  2. Conversations: These are the interactions between agents, which can be managed and directed by the framework. Conversations can be structured and organized to facilitate efficient communication between agents.

  3. Flexibility: The framework is designed to be flexible, allowing for the addition of new agents and conversations as needed. This flexibility enables the framework to adapt to changing requirements and facilitate the development of more complex conversational systems.

Screenshots and logs

No response

Additional Information

Autogen Version: 0.2.6 Operating System: Windows 11 Pro Python Version: 3.11.6

To prove that the LM Studio/OpenAI API hosting of BakLLava works, the following code works fine using Autogens Enhanced Inference.

#Autogen Enhanced Inference, API Unification
from autogen import OpenAIWrapper
import base64

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

# OpenAI endpoint
client = OpenAIWrapper(cache_seed="None", model="llava", base_url="http://0.0.0.0:8001/v1", api_key="NULL") 

# ChatCompletion
response = client.create(messages=[{"role": "user", "content": [
        {
          "type": "text",
          "text": "Whats in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ], 
  model="llava")
# extract the response text
print(client.extract_text_or_completion_object(response))
sonichi commented 10 months ago

@BeibinLi fyi

BeibinLi commented 10 months ago

@shaunaa126 You should use MultimodalConversableAgent instead of LLaVAAgent, because your API call supports the OpenAI API version. The LLaVA agents are designed for the original LLaVA client.

Check the GPT-4V notebook for reference. The usage should be the same (except the config_list).

shaunaa126 commented 10 months ago

@BeibinLi thanks for your response, your solution worked. I was able to use MultiModalConversableAgent to make a call to my OpenAI API endpoint.

Do you know how to send an image locally instead of using image url? The following doesn't work:

user_proxy.initiate_chat(assistant, message=f"Describe this image. <img src={image_path}>" )
BeibinLi commented 10 months ago

@BeibinLi thanks for your response, your solution worked. I was able to use MultiModalConversableAgent to make a call to my OpenAI API endpoint.

Do you know how to send an image locally instead of using image url? The following doesn't work:

user_proxy.initiate_chat(assistant, message=f"Describe this image. <img src={image_path}>" )

Can you send me error message regarding the local image issues? I don't have LM Studio to reproduce your error. Another thing to check is to give an "absolute" path instead of relative path for the message. For instance, you can try

image_path = os.path.abspath(image_path)
print(image_path)
assert os.path.exists(image_path)
user_proxy.initiate_chat(assistant, message=f"Describe this image. <img src={image_path}>" )
shaunaa126 commented 10 months ago

I tried the above and this is the error message I received, which is what I have been receiving with the prior code:

[c:\Users\Downloads\Github\image.jpg](file:///C:/Users/Downloads/Github/image.jpg)
Warning! Unable to load image from src=c:\Users\Downloads\Github\image.jpg, because [Errno 22] Invalid argument: 'src=c:\\Users\\Downloads\\Github\\image.jpg'
user_proxy (to image-explainer):

I am not sure if src is supported. Are there any examples in Autogen of sending a local image instead of a http url in the chat? I don't think this error is specific to LM Studio. Here is my full code example:

import base64

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

image_path = os.path.abspath(image_path)
print(image_path)
assert os.path.exists(image_path)

# Load LLM inference endpoints
config_list = [
    {
        "model": "llava",
        "api_key": "None",
        "base_url": "http://0.0.0.0:8001/v1",
    }
]

assistant = MultimodalConversableAgent(
    name="image-explainer",
    max_consecutive_auto_reply=10,
    llm_config={"config_list": config_list, "temperature": 0.5, "max_tokens": 300},
)

user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    system_message="A human admin.",
    human_input_mode="NEVER",  # Try between ALWAYS or NEVER
    max_consecutive_auto_reply=0,
)

# This initiates an automated chat between the two agents to solve the task
user_proxy.initiate_chat(assistant, message=f"Describe this image. <img src={image_path}>" )
BeibinLi commented 10 months ago

It is not a HTML image tag. Just use "image_path".

For instance user_proxy.initiate_chat(assistant, message=f"Describe this image. <img {image_path}>" ) user_proxy.initiate_chat(assistant, message=f"Describe this image. <img C:/User/temp/Downloads/test.jpg}>" ) user_proxy.initiate_chat(assistant, message=f"Describe this image. <img xyz/test.jpg}>" )

shaunaa126 commented 10 months ago

@BeibinLi thank you, that worked. I am able to use LM Studio to host my Multmodal LLM and run inference as well as MultimodalConversableAgent using Autogen. Great stuff. Appreciate your time.