jonogon / jonogon-mono

Take Action on Issues you care about
MIT License
39 stars 34 forks source link

Multiple images are not uploading #62

Open abrarsami97 opened 2 months ago

abrarsami97 commented 2 months ago
image
errhythm commented 2 months ago

I tried and checked that I could upload multiple images. Can you check again?

anusonawane commented 2 months ago

BadRequestError: Context Length Exceeded for Multiple Images in Base64 Conversion

When attempting to process multiple images with the following code, an error occurs indicating that the context length exceeds the model's maximum token limit. The specific error is:

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 144641 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

Code:

when attempting to use multiple images, the following code results in an error: `import base64

Function to convert image file to base64

def convert_image_to_base64(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8")

Data with image paths

data = [ { "title": "Step 1: Open the Application", "content": [ { "type": "image", "path": "step1_image.png" } ] }, { "title": "Step 2: Navigate to the Settings", "content": [ { "type": "image", "path": "step2_image.png" } ] } ]

Function to construct message with base64 images

def construct_message(data): message_parts = []

for item in data:
    title = item["title"]
    content_parts = []

    for content in item["content"]:
        if content["type"] == "image":
            image_base64 = convert_image_to_base64(content["path"])
            content_parts.append(f'<img src="data:image/png;base64,{image_base64}">')

    message_parts.append(f"{title}\n" + "\n".join(content_parts))

return "\n\n".join(message_parts)

Construct the message with base64 image data

message = construct_message(data) `

The agents are as follows:

`agent1 = MultimodalConversableAgent( name="example-agent-1", max_consecutive_auto_reply=10, llm_config={"config_list": config_list_example, "temperature": 0.5, "max_tokens": 300}, system_message="You are an assistant who helps interpret and analyze image-based content.", ) agent2 = AssistantAgent( name="example-agent-2", max_consecutive_auto_reply=10, system_message="You are a support specialist who provides feedback on content interpretation.", )

user_proxy = autogen.UserProxyAgent( name="User_proxy", system_message="Please describe the image for me.", human_input_mode="TERMINATE", max_consecutive_auto_reply=10, code_execution_config={ "use_docker": False }, )

Set max_round to 5

groupchat = autogen.GroupChat(agents=[agent2, agent1], messages=[], max_round=5)

vision_capability = VisionCapability(lmm_config={"config_list": config_list_example, "temperature": 0.5, "max_tokens": 900}) group_chat_manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=gpt4_llm_config) vision_capability.add_to_agent(group_chat_manager)

rst = user_proxy.initiate_chat( group_chat_manager, message=message, ) `