Open meganrogge opened 1 month ago
this is interesting - we'd effectively have to split the gif/video into frames and include each frame as as an image, since open ai doesn't have pure video (mp4, mov) support and turns gifs into still images.
see https://cookbook.openai.com/examples/gpt_with_vision_for_video_understanding
I believe they mitigate the issue of having a BUNCH of issues by capping each query to 500 tokens. I think in our extension we can definitely play around with this and maybe set some hard caps.
It will be great when we have image support in core. It would be nice to also have gif/video support. For example, a dev could attach a gif to the release notes and generate alt text for it, or a user could attach a screen recording and ask for debugging help.