microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.77k stars 3.24k forks source link

Java: Investigate File/Image/Audio content #5733

Closed johnoliver closed 6 months ago

johnoliver commented 6 months ago

Determine what it means to support these within Java and how to integrate them into the existing code.

dsgrieve commented 6 months ago

File is used by the Assistants API and the Fine-Tuning API.

dsgrieve commented 6 months ago

Audio content goes both ways - speech generation (text to audio) and transcription (audio to text). .NET has ITextToAudioService API. The OpenAIAsyncClient has generateSpeechFromText methods. This would be a pretty straight forward implementation of a text-to-audio service.

Likewise for image generation, OpenAIASyncClient has a getImageGenerations API to generate images from a prompt. .NET has ITextToImageService API. This would also be a pretty straight-forward implementation of an image generation service.

These APIs are in com.azure:azure-ai-openai:1.0.0-beta.7