Closed shaunaa126 closed 10 months ago
@BeibinLi fyi
@shaunaa126 You should use MultimodalConversableAgent instead of LLaVAAgent, because your API call supports the OpenAI API version. The LLaVA agents are designed for the original LLaVA client.
Check the GPT-4V notebook for reference. The usage should be the same (except the config_list).
@BeibinLi thanks for your response, your solution worked. I was able to use MultiModalConversableAgent to make a call to my OpenAI API endpoint.
Do you know how to send an image locally instead of using image url? The following doesn't work:
user_proxy.initiate_chat(assistant, message=f"Describe this image. <img src={image_path}>" )
@BeibinLi thanks for your response, your solution worked. I was able to use MultiModalConversableAgent to make a call to my OpenAI API endpoint.
Do you know how to send an image locally instead of using image url? The following doesn't work:
user_proxy.initiate_chat(assistant, message=f"Describe this image. <img src={image_path}>" )
Can you send me error message regarding the local image issues? I don't have LM Studio to reproduce your error. Another thing to check is to give an "absolute" path instead of relative path for the message. For instance, you can try
image_path = os.path.abspath(image_path)
print(image_path)
assert os.path.exists(image_path)
user_proxy.initiate_chat(assistant, message=f"Describe this image. <img src={image_path}>" )
I tried the above and this is the error message I received, which is what I have been receiving with the prior code:
[c:\Users\Downloads\Github\image.jpg](file:///C:/Users/Downloads/Github/image.jpg)
Warning! Unable to load image from src=c:\Users\Downloads\Github\image.jpg, because [Errno 22] Invalid argument: 'src=c:\\Users\\Downloads\\Github\\image.jpg'
user_proxy (to image-explainer):
I am not sure if src
is supported. Are there any examples in Autogen of sending a local image instead of a http url in the chat? I don't think this error is specific to LM Studio. Here is my full code example:
import base64
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image
image_path = "image.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
image_path = os.path.abspath(image_path)
print(image_path)
assert os.path.exists(image_path)
# Load LLM inference endpoints
config_list = [
{
"model": "llava",
"api_key": "None",
"base_url": "http://0.0.0.0:8001/v1",
}
]
assistant = MultimodalConversableAgent(
name="image-explainer",
max_consecutive_auto_reply=10,
llm_config={"config_list": config_list, "temperature": 0.5, "max_tokens": 300},
)
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
system_message="A human admin.",
human_input_mode="NEVER", # Try between ALWAYS or NEVER
max_consecutive_auto_reply=0,
)
# This initiates an automated chat between the two agents to solve the task
user_proxy.initiate_chat(assistant, message=f"Describe this image. <img src={image_path}>" )
It is not a HTML image tag. Just use "image_path".
For instance
user_proxy.initiate_chat(assistant, message=f"Describe this image. <img {image_path}>" )
user_proxy.initiate_chat(assistant, message=f"Describe this image. <img C:/User/temp/Downloads/test.jpg}>" )
user_proxy.initiate_chat(assistant, message=f"Describe this image. <img xyz/test.jpg}>" )
@BeibinLi thank you, that worked. I am able to use LM Studio to host my Multmodal LLM and run inference as well as MultimodalConversableAgent using Autogen. Great stuff. Appreciate your time.
Describe the bug
I am trying to implement this notebook: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_lmm_llava.ipynb
The only difference is that I am hosting my BakLLava model using
LM Studio
and running it behind OpenAI API. When testing thellava_call
I get the following error:Steps to reproduce
LLAVA_MODE
to "local"import matplotlib.pyplot as plt import requests from PIL import Image from termcolor import colored
import autogen from autogen import Agent, AssistantAgent, ConversableAgent, UserProxyAgent from autogen.agentchat.contrib.llava_agent import LLaVAAgent, llava_call
LLAVA_MODE = "local" # Either "local" or "remote" assert LLAVA_MODE in ["local", "remote"]
Expected Behavior
It should be able to execute the llava_call successfully and respond with something like the following:
Screenshots and logs
No response
Additional Information
Autogen Version: 0.2.6 Operating System: Windows 11 Pro Python Version: 3.11.6
To prove that the LM Studio/OpenAI API hosting of BakLLava works, the following code works fine using Autogens Enhanced Inference.