Open PaarthShah opened 1 year ago
As far as I can tell we're limited by the library we use for API communication, which does not yet support vision. Although I'm very interested and will check what we can do as soon as the library adds support.
I think we might consider dropping the third party SDK and switch to the official node package from openai: https://github.com/openai/openai-node#readme which seems to support vision
Going for the official node library seems like the best option for long-term sustainability and rapid adoption of new features
Yes but we would then be responsible for handling context, which is fine if someone is willing to write the code.
https://platform.openai.com/docs/guides/vision
It seems like uploading base64-encoded images may be a generic viable strategy for passing images through the API.
Alternatively/for speed, and from unencrypted rooms, it may instead be possible/desirable to pass an image URL by transforming the image
mxc
url to anhttps
url via theimage_url
key.