[FEATURE] Support audio, video, rich format support for multimodal models

langchain4j / langchain4j

Java version of LangChain

https://docs.langchain4j.dev

Apache License 2.0

4.35k stars 842 forks source link

[FEATURE] Support audio, video, rich format support for multimodal models #1463

Open glaforge opened 1 month ago

glaforge commented 1 month ago

Models like Gemini support text and images in input, but also other formats like audio, video, or PDF files. The goal of this ticket is to add support to both audio, video, and rich format files, starting with Gemini for experimentation.

SandraAhlgrimm commented 1 month ago

Hi Guillaume, that would be great. I am working on Audio support for Azure Open AI. It would be great if we can use the same Audio models so users can switch between them easily. I'll add you as soon as I have something to review (hopefully in a few hours). If you're faster, I am happy to use/enhance your implementation

glaforge commented 1 month ago

Oh just saw your comment @SandraAhlgrimm I added Audio/AudioContent and Video/VideoContent classes based on the same structure as Image and ImageContent. I've added those in #1464