langchain4j / langchain4j

Java version of LangChain
https://docs.langchain4j.dev
Apache License 2.0
4.97k stars 995 forks source link

[FEATURE] Support audio, video, rich format support for multimodal models #1463

Open glaforge opened 4 months ago

glaforge commented 4 months ago

Models like Gemini support text and images in input, but also other formats like audio, video, or PDF files. The goal of this ticket is to add support to both audio, video, and rich format files, starting with Gemini for experimentation.

SandraAhlgrimm commented 4 months ago

Hi Guillaume, that would be great. I am working on Audio support for Azure Open AI. It would be great if we can use the same Audio models so users can switch between them easily. I'll add you as soon as I have something to review (hopefully in a few hours). If you're faster, I am happy to use/enhance your implementation

glaforge commented 4 months ago

Oh just saw your comment @SandraAhlgrimm I added Audio/AudioContent and Video/VideoContent classes based on the same structure as Image and ImageContent. I've added those in #1464

langchain4j commented 1 week ago

Implemented in https://github.com/langchain4j/langchain4j/pull/1464