Open isaacbmiller opened 2 weeks ago
This is how I did it in fewshot
:
def format_input_simple(pydantic_object: BaseModel, img_formatter=None) -> dict[str, Any]:
if img_formatter is None:
img_formatter = gpt_format_image
image_map = {}
def replace_image_with_id(obj: Any) -> Any:
image_id = f"[image {len(image_map) + 1}]"
image_map[image_id] = obj.base64()
return image_id
dict_obj = map_images(pydantic_object, replace_image_with_id)
processed = json.dumps(dict_obj)
content = [{"type": "text", "text": processed}]
for image_id, image in image_map.items():
content.append({"type": "text", "text": image_id + ":"})
content.append(img_formatter(image))
return {"role": "user", "content": content}
Basically when I turn the input object into json, I replace all images with an ID. Then at the end of the message I send the list of (ID, img) pairs.
Works reasonably well.
Currently, only you can only pass a single image at a time in a signature.
E.g. this will work
class ImageSignature(dspy.Signature): image1: dspy.Image = dspy.InputField() image2: dspy.Image = dspy.InputField()
But any more complex types involving images wont:
class ImageSignature(dspy.Signature): images: List[dspy.Image] = dspy.InputField() class ImageSignature(dspy.Signature): labeled_images: Dict[str, dspy.Image] = dspy.InputField()
This is due to how images are compiled into OAI compatible messages, where inside
chat_adapter.py
we create a large list of content blocks by giving fields with an image_url special privileges:{ "content": [{ "type": "text", "text": "...", }, { "type": "image_url" "image_url": {"url": "..."} # url is either an actual url or the base64 data }] }
I do some fairly naive parsing inside
ChatAdapter
, and there is definitely a more elegant solution here. #1763 addresses the List case, but I want a more generalized solution.cc @okhat
Hey, I was trying to perform VQA with an LLM using dspy for optimized prompting and I'm not able to pass the base64image to LLM via dspy. Could you let me know how you were able to do it? I tried dspy.Image but I get an error saying No module called dspy.Image. Thanks
@rzr2kor Are you on the latest version of DSPy? pip install -U dspy
Then at the end of the message I send the list of (ID, img) pairs.
@thomasahle Did you find that this worked better than interweaving the {"type": "image_url", "image_url": ...})
into your actual text content, or just a design decision
With images complex types it seems like we could unlock MiproV2 w fewshots aware enabled as DescribeProgram
/ DescribeModule
could then be modified to receive program_example
that contains images.
Then at the end of the message I send the list of (ID, img) pairs.
@thomasahle Did you find that this worked better than interweaving the
{"type": "image_url", "image_url": ...})
into your actual text content, or just a design decision
I couldn't put it in "the actual context", since that was just one big json string
Currently, only you can only pass a single image at a time in a signature.
E.g. this will work
But any more complex types involving images wont:
This is due to how images are compiled into OAI compatible messages, where inside
chat_adapter.py
we create a large list of content blocks by giving fields with an image_url special privileges:I do some fairly naive parsing inside
ChatAdapter
, and there is definitely a more elegant solution here.1763 addresses the List case, but I want a more generalized solution.
cc @okhat