microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.7k stars 3.22k forks source link

.Net: Consider ImageContent and AudioContent inherit from BinaryContent #5262

Open SergeyMenshykh opened 7 months ago

SergeyMenshykh commented 7 months ago

@matthewbolanos please provide details of the issue that would be fixed by the change.

Krzysztof318 commented 7 months ago

@SergeyMenshykh I don't think it will be good. BinaryContent.Data is required but ImageContent.Data and AudioContent.Data are optional, so casting these two to BinaryContent would give inconsistent behavior. Following this way, we should create a DataKernelContent class with Uri and BinaryData parameters and inherit from it.

matthewbolanos commented 7 months ago

This feature should be worked on in conjunction with https://github.com/microsoft/semantic-kernel/issues/5263 (which should address @Krzysztof318's concerns around BinaryContent.Data being required). Because BinaryContent could/should be hydrated with either a URL or a byte array, it should likely be renamed to FileContent.

The main scenario that we'd like to support is the following...

Context

Scenario

matthewbolanos commented 5 months ago

As a test of the new hierarchy, the following POC should be possible:

  1. Ask an agent built with the Assistant API with Code Interpreter to create an 1) image, 2) audio file, and 3) word file
  2. Get the collection of files from the ChatMessageContent (today they come in via the AnnotationContent, so there needs to be a way to easily turn this into BinaryContent/FileContent; perhaps AnnotationContent is merged with BinaryContent/FileContent and the data like Quote can be stored in a metadata field)
  3. Save all the files to disk using the parent class (e.g., BinaryContent or FileContent)
  4. Type caste the image to ImageContent to perform a text-to-image task
  5. Type caste the audio to AudioContent to perform audio-to-text task