Open daniel-counto opened 1 month ago
yes wondering how images are being inserted in Autogen
@BeibinLi for awareness!
The charge for reading images does not depend on how the images are inserted, but on how large the images are. Moreover, AutoGen uses the same format as the OpenAI vanilla API, see this line of code.
Even if there are only 400 images in the document, because of the multi-agent design, the chat history may contain more than one image. This is particularly true for group chats, because each agent has the history of all other agents. For instance, if there are 10 agent interactions in the group chat, the number of images we have in the prompt is: 1 + 2 + 3 + ... + 10 = 55 times.
For more details, please read here.
Describe the issue
I created three agents to read document images, which are black and white financial documents, and are not very huge in terms of resolution (around 1k x 2k or smaller). The model I used for all of them is GPT-4o.
The flow it creates is mostly linear, i.e. from agent 1 -> agent 2 -> final agent to summarize as output. However, for only 400 images I uploaded, it already costs me like USD200 +, and the context tokens used are about 28+ million tokens!
I wonder if this is because Autogen inserts image bits into the prompt itself? If so, shouldn't the best way is to upload the images to some place and then just insert the image path link to the prompts?
Steps to reproduce
Step1 - The agents are constructed as follows:
Step -2
Step-3
execute the above multiagent model, with about 500 images. Each is a standard invoice image.
Screenshots and logs
Additional Information
the right way to send image for OpenAI api is not sending string but this method:
please make the changes.