LLava always speaks of 2 images

Describe the issue

Issue: I have LLava running via ollama and a python script sending screenshots to it. It's meant to help my blind mother have a description of what's on screen. Whenever I run the script, the model speaks of 2 images, with both being similar to the screenshot with some descrepancies. Is this a hallucination?

Python code:

import keyboard
import pyttsx3
from PIL import ImageGrab
import requests
import base64
from io import BytesIO
import json

def capture_screenshot():
    return ImageGrab.grab()

def describe_image(image):
    buffer = BytesIO()
    image.save(buffer, format="JPEG")
    img_str = base64.b64encode(buffer.getvalue())
    payload = json.dumps({
        "model": "llava",
        "prompt": "This is a screenshot from a Windows PC. Your job is to describe the contents of the screenshot for the user. The description is for a visualy impaired or blind person. the description should be like that of another person telling the blind person about what they see in front of them. You can ignore any windows elements if they are not the main focus.",
        "images": [img_str.decode("utf-8")],
        "stream": False
    })
    r = requests.post('http://localhost:11434/api/generate', data=payload)
    return r.json()["response"]

def narrate_description(description):
    engine = pyttsx3.init()
    engine.say(description)
    engine.runAndWait()

def main():
    print("Press the 'home' key to capture a screenshot and get its description.")
    while True:
        if keyboard.is_pressed('home'):
            print("Home key pressed, capturing screenshot...")
            print("Screenshot captured, describing the image...")
            description = describe_image(capture_screenshot())
            narrate_description(description)
            print("Press the 'home' key to capture another screenshot.")

if __name__ == "__main__":
    main()

haotian-liu / LLaVA

LLava always speaks of 2 images #1591

Describe the issue