Add experimental support for vision input to ChatGPT

This update introduces experimental support for vision input to ChatGPT. A new class, ChatGPTProcessorWithVisionBase, has been added to handle image inputs, inheriting from ChatGPTProcessor. An example implementation, ChatGPTProcessorWithVisionScreenShot, demonstrates how to capture screenshots using pyautogui.

import io
import pyautogui

class ChatGPTProcessorWithVisionScreenShot(ChatGPTProcessorWithVisionBase):
    async def get_image(self) -> bytes:
        buffered = io.BytesIO()
        image = pyautogui.screenshot(region=(0, 0, 1280, 720))
        image.save(buffered, format="PNG")
        image.save("image_to_chatgpt.png")
        return buffered.getvalue()

To utilize this new feature, you can instantiate ChatGPTProcessorWithVisionScreenShot instead of ChatGPTProcessor and set it in the AIAvatar. Only the latest image will be sent to ChatGPT to avoid performance issues. The system uses function calling to determine if image retrieval is necessary, which adds approximately 500 milliseconds to 1 second to the processing time.

uezo / aiavatarkit

Add experimental support for vision input to ChatGPT #51