Java: Investigate File/Image/Audio content

johnoliver commented 6 months ago

Determine what it means to support these within Java and how to integrate them into the existing code.

dsgrieve commented 6 months ago

File is used by the Assistants API and the Fine-Tuning API.

An assistant can use tools - code-interpreter, retrieval, function calling. The code-interpreter and retrieval tools can make use of file content.
There is a com.azure.ai.openai.assistants package in the Java SDK for Azure AI OpenAI. The Assistants[Async]Client has the API for the File operations.
Fine-tuning is done by uploading a training file with the purpose 'fine-tune'. So there isn't a separate API for fine tuning. The Assistants API (linked above) has the API for file upload. com.azure.ai.openai.assistants.models.FileDetail is used to tell the model what the file is for.
We can do a File content API like is done in .NET, but we would need to either use com.azure.ai.openai.assistants or implement against the REST API. There is a lot to the Assistants API - Threads, Messages, Runs - so it seems that leveraging com.azure.ai.openai.assistants would be the most sensible course of action.
We will need to provide implementation for the code-interpreter and retrieval 'tools'
Basically, there isn't much point in handling file content unless we first implement Assistants. Currently, .NET has an Assistants model that is experimental. I imagine that we would have an Assistants AI service.

dsgrieve commented 6 months ago

Audio content goes both ways - speech generation (text to audio) and transcription (audio to text). .NET has ITextToAudioService API. The OpenAIAsyncClient has generateSpeechFromText methods. This would be a pretty straight forward implementation of a text-to-audio service.

Likewise for image generation, OpenAIASyncClient has a getImageGenerations API to generate images from a prompt. .NET has ITextToImageService API. This would also be a pretty straight-forward implementation of an image generation service.

These APIs are in com.azure:azure-ai-openai:1.0.0-beta.7

microsoft / semantic-kernel

Java: Investigate File/Image/Audio content #5733