Add Multimodal RAG (Text + Image) for Retrieval-Augmented Generation Using Llama

huggingface / huggingface-llama-recipes

537 stars 60 forks source link

Add Multimodal RAG (Text + Image) for Retrieval-Augmented Generation Using Llama #64

Open MayankChaturvedi opened 1 month ago

MayankChaturvedi commented 1 month ago

A notebook that demonstrates how to use a multimodal RAG that combines two types of inputs, such as text and images, to retrieve relevant information from a dataset and generate new outputs based on the retrieved data.

Example Input: Takes a text query along with an image (e.g., "Which fruit is this?") Retrieval: Uses the image and the text to retrieve relevant documents or facts from a knowledge base or external dataset (e.g., Wikipedia articles on animals). Generation: The system generates a coherent response based on the retrieved information (e.g., "This is a blueberry!").

neural-navigator commented 1 month ago

I would love to contribute to this issue @MayankChaturvedi

ariG23498 commented 1 month ago

I love the idea!

So to make it even more clear:

Use a multimodal (image and text pair) dataset from the Hugging Face Hub
Embed the dataset using an embedding model
RAG with multimodal Llama

If that is the workflow, I think we should go forward. Also, it would be great to not make this very complicated. We would like to see a very simple notebook that does what it needs to while not making it too complicated.

PS: I have added this issue to the main issue #43

silvererudite commented 1 month ago

hello @MayankChaturvedi as proposed by @ariG23498 in this issue #55 I would love to contribute to this work as well. Let me know if you want to divide/collab in any subtask for this.

atharv-jiwane commented 1 month ago

Hey @ariG23498 I was redirected to #47 , thank you for that! I would also love to join this team @MayankChaturvedi . Please let me know if there is space for collaboration here too!

MayankChaturvedi commented 1 month ago

Hi folks, thanks for your interest in the issue. We need a simple notebook. I will create a branch so that three of us can collaborate on it. Meanwhile I'll also come up with a distribution of tasks. Let's collaborate on a discord group? - https://discord.gg/rhbqXsyX @ariG23498 does this setup sound good?

ariG23498 commented 1 month ago

@MayankChaturvedi the collaboration sounds great!

Let me know if you folks need help -- the best way of reaching me is this issue. It would be open for others to view and learn 🤗

renuka010 commented 1 month ago

Hi @MayankChaturvedi I would love to collaborate on this issue. Let me know if I can contribute to this issue.