langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.81k stars 6.62k forks source link

Multimodal Embedding #7866

Open taowang1993 opened 4 weeks ago

taowang1993 commented 4 weeks ago

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

Currently, Dify supports only text embedding.

But I need to show users images from my documents such as diagrams and graphs.

This feature is very useful in education, medicine, legal and finance domains.

Major vector DBs already support multimodal RAG.

https://weaviate.io/blog/multimodal-models

https://milvus.io/docs/multimodal_rag_with_milvus.md

https://jina.ai/news/jina-clip-v1-a-truly-multimodal-embeddings-model-for-text-and-image/

2. Additional context or comments

No response

3. Can you help us with this feature?

friedinando commented 2 days ago

+1