langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
52.6k stars 7.69k forks source link

Multimodal Embedding #7866

Open taowang1993 opened 2 months ago

taowang1993 commented 2 months ago

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

Currently, Dify supports only text embedding.

But I need to show users images from my documents such as diagrams and graphs.

This feature is very useful in education, medicine, legal and finance domains.

Major vector DBs already support multimodal RAG.

https://weaviate.io/blog/multimodal-models

https://milvus.io/docs/multimodal_rag_with_milvus.md

https://jina.ai/news/jina-clip-v1-a-truly-multimodal-embeddings-model-for-text-and-image/

2. Additional context or comments

No response

3. Can you help us with this feature?

friedinando commented 2 months ago

+1

monotykamary commented 2 weeks ago

https://docs.voyageai.com/docs/multimodal-embeddings I think this is the near future for multimodal RAG, especially since OCR for Open WebUI and now Claude's Visual PDFs are getting heavier use-cases.