Add GPT-Vision and image support to the ai sdk

vercel / ai

Build AI-powered applications with React, Svelte, Vue, and Solid

https://sdk.vercel.ai/docs

Other

9.39k stars 1.37k forks source link

Add GPT-Vision and image support to the ai sdk #712

Closed nicolaerusan closed 10 months ago

nicolaerusan commented 10 months ago

Feature Description

GPT vision just came out. We're looking to integrate a chat experience that we would like to leverage these features as well but ideally we could leverage this SDK for it.

https://platform.openai.com/docs/guides/vision

Related to this, would be great to get assistants in the SDK too, but that could be a separate issue :) https://platform.openai.com/docs/assistants/overview

Use Case

Be able to select 3-4 images and send a message including those images

Additional context

No response

Lurrobert commented 10 months ago

is anyone on it? it shouldn't be that hard, just a modification of the types. output stream is the same edit: for those searching for solution try: // @ts-ignore

dosstx commented 10 months ago

I'm definitely interested in this, as well. I do use a combination of the AI SDK and the OpenAI SDks and it works well, but things are changing fast and I think it's better to have a community work on a single tool (AI SDK?) to make everything easier to work with. Hope Vercel can allocate more resources to this as this is definitely the future of app development.

MaxLeiter commented 10 months ago

We’re actively investigating supporting vision and the other new features.

it shouldn't be that hard, just a modification of the types

I’d rather introduce a breaking change when we have more to offer than just a type change. There’s likely more involved changes we can make to improve the developer experience.

MaxLeiter commented 10 months ago

Thanks to #725 by @lgrammel, you can now send images in the data field in react/use-chat in order to interact with the vision API. See this example: https://github.com/vercel/ai/blob/main/examples/next-openai/app/api/chat-with-vision/route.ts. Docs will be coming soon

marcusschiesser commented 9 months ago

@nicolaerusan, great job; here's an example for this feature: https://github.com/marcusschiesser/ai-chatbot - would be great if you could also add images to the messages; see https://github.com/vercel/ai/issues/768

marclelamy commented 7 months ago

Hey @MaxLeiter do you know if the image being part of the returned message is being worked on as @marcusschiesser asked above? It would be super helpful to keep the image in the message array, especially knowing there is going to be more and more multi modals models that will come out in the coming months. @lgrammel

marclelamy commented 7 months ago

Technically once the completion of the new message is done, we could do something like this inside the onFinish callback. Only issue is the wrong typing...


setMessages([
    ...messages,
    {
        ...messages[messages.length - 1],
        content: [
            { "type": "text", "text": "What’s in this image?" },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            },
        ],
    },
])
`

marcusschiesser commented 7 months ago

@marclelamy the code generated by create-llama is now supporting displaying sent images in the messages

taylor-lindores-reeves commented 2 months ago

Does Vercel offer OpenAI vision integration

lgrammel commented 2 months ago

@taylor-lindores-reeves check out multimodal prompts: https://sdk.vercel.ai/docs/foundations/prompts#multi-modal-messages