This update introduces experimental support for vision input to ChatGPT. A new class, ChatGPTProcessorWithVisionBase, has been added to handle image inputs, inheriting from ChatGPTProcessor. An example implementation, ChatGPTProcessorWithVisionScreenShot, demonstrates how to capture screenshots using pyautogui.
To utilize this new feature, you can instantiate ChatGPTProcessorWithVisionScreenShot instead of ChatGPTProcessor and set it in the AIAvatar. Only the latest image will be sent to ChatGPT to avoid performance issues. The system uses function calling to determine if image retrieval is necessary, which adds approximately 500 milliseconds to 1 second to the processing time.
This update introduces experimental support for vision input to ChatGPT. A new class, ChatGPTProcessorWithVisionBase, has been added to handle image inputs, inheriting from ChatGPTProcessor. An example implementation, ChatGPTProcessorWithVisionScreenShot, demonstrates how to capture screenshots using pyautogui.
To utilize this new feature, you can instantiate ChatGPTProcessorWithVisionScreenShot instead of ChatGPTProcessor and set it in the AIAvatar. Only the latest image will be sent to ChatGPT to avoid performance issues. The system uses function calling to determine if image retrieval is necessary, which adds approximately 500 milliseconds to 1 second to the processing time.