Multimodal Embedding - Githubissues

taowang1993 commented 2 months ago

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Currently, Dify supports only text embedding.

But I need to show users images from my documents such as diagrams and graphs.

This feature is very useful in education, medicine, legal and finance domains.

Major vector DBs already support multimodal RAG.

No response

friedinando commented 2 months ago

+1

monotykamary commented 2 weeks ago

https://docs.voyageai.com/docs/multimodal-embeddings I think this is the near future for multimodal RAG, especially since OCR for Open WebUI and now Claude's Visual PDFs are getting heavier use-cases.

langgenius / dify