run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.16k stars 5.16k forks source link

Multimodal Retriever is not retrieving right documents with OPENclip embedding models. #10546

Closed NeethanShetty07 closed 5 months ago

NeethanShetty07 commented 8 months ago

Question Validation

Question

https://docs.llamaindex.ai/en/stable/examples/multi_modal/ChromaMultiModalDemo.html

I was using above code for retrieving right images from the directory of images, But every time i'm getting wrong images from the retriever,

My input images usually contains tables or paragraphs. I also tried with Multimodalvector and Vector retriever , but the results are same..

So please anyone knows how to improve this , Let me know

dosubot[bot] commented 8 months ago

🤖

Hey there @NeethanShetty07! Nice to see you back. Hope you're ready to tackle this new challenge together!

Based on your description, it seems like you're having trouble retrieving the correct images using the Multimodal Retriever with OPENclip embedding models in LlamaIndex. Here are a few suggestions that might help improve the accuracy:

  1. Change the model: The default model for OPENclip is "ViT-B/32". You might want to try using a different model from the AVAILABLE_CLIP_MODELS list. Each model has different characteristics and might perform better on your specific dataset. Here are the available models:

    • "RN50"
    • "RN101"
    • "RN50x4"
    • "RN50x16"
    • "RN50x64"
    • "ViT-B/32"
    • "ViT-B/16"
    • "ViT-L/14"
    • "ViT-L/14@336px"

    You can change the model like this:

    clip_embedding = ClipEmbedding(embed_batch_size=20, model_name="ViT-B/16")
  2. Increase the batch size: The default batch size for embedding generation is DEFAULT_EMBED_BATCH_SIZE. Increasing the batch size might improve the accuracy of the model, but it will also increase the memory usage.

  3. Preprocessing: Ensure that the images are preprocessed in the same way as the training data of the model. This includes resizing, normalization, etc.

  4. Fine-tuning: If you have a labeled dataset, you can fine-tune the model on your specific task. This usually improves the performance significantly.

Please note that these are general suggestions and the effectiveness might vary depending on your specific use case. You can find more details in the ClipEmbedding section of the LlamaIndex code.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
NeethanShetty07 commented 8 months ago

Hi , can you show me an example how to use this Multimodal embedding ? with code

hatianzhang commented 8 months ago

@NeethanShetty07 do you have enough images for retrieval? if we have limited images, the results could be irrelevant.

could you share your example notebook with me thru colab so I can debug

NeethanShetty07 commented 8 months ago

Hi @hatianzhang , here is the colab code which i was using. https://colab.research.google.com/drive/1NZJXjg7q7Q4fn-kdP2VRmI2L0j8UNbcB?usp=sharing

Let me know if im doing anything wrong here.

IN query_to_test, im passing the text which i was searching inside the image.

also i have attached the image zip file, , please find it table_images.zip

dosubot[bot] commented 5 months ago

Hi, @NeethanShetty07,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, you were experiencing inaccurate document retrieval using the Multimodal Retriever with OPENclip embedding models. Dosubot provided suggestions to improve retrieval accuracy, and you shared a Colab code and image zip file for review. It seems like the issue has been resolved.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you!