jackMort / ChatGPT.nvim

ChatGPT Neovim Plugin: Effortless Natural Language Generation with OpenAI's ChatGPT API
Apache License 2.0
3.56k stars 307 forks source link

FR: allow multimodal input / vision / images #429

Closed thiswillbeyourgithub closed 2 months ago

thiswillbeyourgithub commented 2 months ago

It would be simple to make it so that in the prompt text paths/urls to images are replaced by image call.

I could then for example add a shortcut so that images that are in my clipboard could be pasted to /tmp and add a path automatically.

See the kind of workflow implemented in ollama:

What's in this image? /Users/jmorgan/Desktop/smile.png
The image features a yellow smiley face, which is likely the central focus of the picture.

Somewhat related to:

Edit: Oh I see that there's already partial support there: https://github.com/jackMort/ChatGPT.nvim/pull/332

It should be :

thiswillbeyourgithub commented 2 months ago

For anyone interested I added a patch file and demo showcasing the vision feature in this PR