microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.49k stars 3.17k forks source link

.Net: Add support for base64 images for GPT-4-Vision when available in Azure SDK #4272

Closed deepinderdeol closed 1 month ago

deepinderdeol commented 9 months ago

This gpt-4-vision sample works with sample image provided in the sample code: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example68_GPTVision.cs

However, using a local image file as ImageContent results in an exception.

I tried following instructions on the OpenAI site for using base64-encoded format, but haven't been successful: https://platform.openai.com/docs/guides/vision

dmytrostruk commented 9 months ago

@deepinderdeol Thank you for reporting this issue. At the moment, Azure .NET SDK for OpenAI allows to pass image as Uri object only with a link to an image hosted remotely. As soon as base64-encoded format will be supported, we will update Semantic Kernel SDK as well. Thanks again!

Alerinos commented 9 months ago

@dmytrostruk Can't we use the OpenAI API which already has this implemented? The longer I use SK the more I get the impression that most of the features don't work or are not yet implemented.

dmytrostruk commented 9 months ago

@dmytrostruk Can't we use the OpenAI API which already has this implemented? The longer I use SK the more I get the impression that most of the features don't work or are not yet implemented.

@Alerinos There are a couple of ways how to use OpenAI functionality - use already existing SDKs or implement our own logic to perform requests. Each approach has its advantages and disadvantages. The main advantage in using Azure .NET SDK is to re-use a lot of functionality that is already implemented, tested and available rather than implementing our own from scratch. This allows us to focus on Semantic Kernel core functionality.

But if you want to use SK with some OpenAI features which are not available yet, it's still possible to implement custom connector, add all necessary logic to use OpenAI API and inject it to Kernel instance. Here is an example: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/KernelSyntaxExamples/Example16_CustomLLM.cs

jorisdg commented 8 months ago

+1 on this. Looks like this is part of the Azure SDK's 1.0 release: https://github.com/Azure/azure-sdk-for-net/blob/Azure.AI.OpenAI_1.0.0-beta.12/sdk/openai/Azure.AI.OpenAI/README.md#chat-with-images-using-gpt-4-vision-preview

The raw image goes in as an image URL but the url is a data url such as data:image/png;base64, ...

arafattehsin commented 8 months ago

+1 on this. Looks like this is part of the Azure SDK's 1.0 release: https://github.com/Azure/azure-sdk-for-net/blob/Azure.AI.OpenAI_1.0.0-beta.12/sdk/openai/Azure.AI.OpenAI/README.md#chat-with-images-using-gpt-4-vision-preview

The raw image goes in as an image URL but the url is a data url such as data:image/png;base64, ...

The capability is there but it just takes 64 KB max. Not more than that..

artemkoloskov commented 6 months ago

Can we maybe get an update on this? It is an important functionality, specifically for security reasons, it seems like the only workaround to this is making the images available, hosted somewhere, and this is something that should be possible to avoid when the data is sensitive. Azure-hosted models are ideal for use cases where the data should stay behind the firewalls as much as possible, sending the image to the model directly as data is a necessity.

It seems that the Semantic Kernel is ready to support this, but the Azure AI sdk is not

iyhammad commented 6 months ago

Supporting base64 images is also very important for test\development scenarios

dersia commented 6 months ago

I have created a new PR in the Azure SDK for NET repo that will allow us to finally close this issue. https://github.com/Azure/azure-sdk-for-net/pull/43093

jessejiang0214 commented 5 months ago

Any update on this? Just thinking if possible to insert image into promt, as image has to be in that position.

arafattehsin commented 5 months ago

One of the long hanging fruits, it has become..

SimonLuckenuik commented 4 months ago

Binary + Mime Type seems to be supported now: https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/README.md#chat-with-images-using-gpt-4-turbo

I am assuming that adding a condition based on Uri being present or not to use the proper ctor overload would do the trick? (reference)

const string rawImageUri = "<URI to your image>";
using Stream jpegImageStream = File.OpenRead("<path to a local image file>");

ChatCompletionsOptions chatCompletionsOptions = new()
{
    DeploymentName = "gpt-4-turbo",
    Messages =
    {
        new ChatRequestSystemMessage("You are a helpful assistant that describes images."),
        new ChatRequestUserMessage(
            new ChatMessageTextContentItem("Hi! Please describe these images"),
            new ChatMessageImageContentItem(new Uri(rawImageUri)),
            new ChatMessageImageContentItem(jpegImageStream, "image/jpg", ChatMessageImageDetailLevel.Low)),
    },
};
matthewbolanos commented 3 months ago

@RogerBarreto, is this something you're tracking as part of the graduation of the content types (in particular ImageContent)?

artemkoloskov commented 3 months ago

It worked with v1.14.1, thanks a lot, to everyone involved!

SimonLuckenuik commented 3 months ago

Working in v1.14.1 for me as well!