Open taowang1993 opened 2 months ago
Currently, Dify supports only text embedding.
But I need to show users images from my documents such as diagrams and graphs.
This feature is very useful in education, medicine, legal and finance domains.
Major vector DBs already support multimodal RAG.
https://weaviate.io/blog/multimodal-models
https://milvus.io/docs/multimodal_rag_with_milvus.md
https://jina.ai/news/jina-clip-v1-a-truly-multimodal-embeddings-model-for-text-and-image/
No response
+1
https://docs.voyageai.com/docs/multimodal-embeddings I think this is the near future for multimodal RAG, especially since OCR for Open WebUI and now Claude's Visual PDFs are getting heavier use-cases.
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
Currently, Dify supports only text embedding.
But I need to show users images from my documents such as diagrams and graphs.
This feature is very useful in education, medicine, legal and finance domains.
Major vector DBs already support multimodal RAG.
https://weaviate.io/blog/multimodal-models
https://milvus.io/docs/multimodal_rag_with_milvus.md
https://jina.ai/news/jina-clip-v1-a-truly-multimodal-embeddings-model-for-text-and-image/
2. Additional context or comments
No response
3. Can you help us with this feature?