guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
18.81k stars 1.04k forks source link

image raises subscript exception #865

Open robmck-ms opened 4 months ago

robmck-ms commented 4 months ago

The bug Using the multi-modal code from the README results in TypeError: 'GoogleAIChatEngine' object is not subscriptable:

<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
Traceback (most recent call last):
  File "/Users/robmck/git/me/ai-experiments/design-critique/guidance-image-test.py", line 19, in <module>
    lm += gen("answer")
  File "/Users/robmck/git/me/ai-experiments/guidance/guidance/models/_model.py", line 1159, in __add__
    out = lm._run_stateless(value)
  File "/Users/robmck/git/me/ai-experiments/guidance/guidance/models/_model.py", line 1364, in _run_stateless
    for chunk in gen_obj:
  File "/Users/robmck/git/me/ai-experiments/guidance/guidance/models/_model.py", line 760, in __call__
    logits = self.get_logits(token_ids, forced_bytes, current_temp)
  File "/Users/robmck/git/me/ai-experiments/guidance/guidance/models/_grammarless.py", line 338, in get_logits
    raise new_bytes
  File "/Users/robmck/git/me/ai-experiments/guidance/guidance/models/_grammarless.py", line 165, in _start_generator_stream
    for chunk in generator:
  File "/Users/robmck/git/me/ai-experiments/guidance/guidance/models/_googleai.py", line 211, in _start_generator
    mime_type="image/jpeg", data=self[raw_parts[i + 1]]
TypeError: 'GoogleAIChatEngine' object is not subscriptable

Looking through the code, image() saves the binary image to Model._variables. GoogleAIChatEngine seems to expect that data to be in self[image_id], but GoogleAIChatEngine nor any of its parent classes has a getitem(). Perhaps it was written in an earlier factoring in which the engine and model objects were one and the same?

To Reproduce

from guidance import (
    models,
    user,
    assistant,
    gen,
    image,
)
import os

google_key = os.environ.get("GEMINI_API_KEY")
gemini = models.GoogleAIChat(
    "gemini-pro-vision",
    api_key=google_key,
)
with user():
    lm = gemini + "What is this a picture of?" + image("chairs.jpg")

with assistant():
    lm += gen("answer")

System info (please complete the following information):

Harsha-Nori commented 4 months ago

Yes I think our image support is broken right now. You're exactly right that it was written when models and engines were identical :). @nking-1 is looking into how to best re-enable this and also bring support for image to more models.

robmck-ms commented 4 months ago

Thanks!

In the meantime, I hacked it to work by plumbing Models._variables down through to the _generator function of images (and added OpenAI support too via that hack). Not very elegant, but it unblocks me for now to play with images.

Here it is if you're curious: https://github.com/robmck-ms/guidance/tree/hack_image_support

kklemon commented 3 months ago

Any update on this?