[Question]: Approaches to searching documents requiring different kind of embeddings (multimodal too)

coffepowered commented 1 month ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

Hello, I am reflecting on how to tackle the problem of searching over a knowledge base that could require different modalities for embeddings. To make this more concrete, my users are used to upload:

business documents/reports (what the system was conceived to ingest)
documents containing code or SQL (I get bad performance on this!)
documents containing images
documents containing tables

I recall llama_index has some approaches to images and tables, but is there a general preferred pattern to do this? In general, I see some options to handle this:

having a super strong multimodal embedding (which is able to interpret images, SQL, multilingual text and whatever). I am not aware of such an embedded
maintaining segregated indexes for each modalities, then route queries depending on the type of question
a mix of 1-2.

Do you see more pattern to apply? Which would be the "llama-index" way of approaching the problem, if any?

dosubot[bot] commented 1 month ago

To effectively search over a knowledge base with documents requiring different kinds of embeddings, including multimodal ones, LlamaIndex offers a robust approach that aligns well with the options you're considering. Here's a concise overview tailored to your needs:

Multimodal Embeddings: LlamaIndex employs MultiModalEmbedding for handling various content types, such as text, images, and code. For images, it uses CLIP embeddings, and for documents containing code or SQL, it leverages specialized embeddings that understand the structure and semantics of code, enhancing retrieval accuracy.
Separate Indexes for Each Modality: It maintains separate vector stores for different modalities within the MultiModalVectorStoreIndex class. This setup allows for efficient handling and storage of embeddings specific to each modality, ensuring that each type of content is indexed and queried optimally.
Combination of Approaches: By combining multimodal embeddings with separate indexes for each modality, LlamaIndex provides a comprehensive solution for indexing and querying a diverse knowledge base. This hybrid approach ensures that all types of documents, whether they contain text, images, code, or SQL, are effectively processed and made searchable.

For implementing these strategies, consider leveraging vector space models for semantic representation, utilizing specialized embeddings for code and text, and integrating with language models for enhanced query processing. Customizable ingestion pipelines and efficient indexing mechanisms are also crucial for handling large and diverse datasets.

For more detailed guidance and examples, you might want to explore the source code and documentation provided by LlamaIndex, which offers insights into the implementation of these strategies.

#### Details

**Sources** - [LlamaIndex Open Source Roadmap](https://github.com/run-llama/llama_index/discussions/9888) - [[Feature Request]: Additional embedding fine-tuning features](https://github.com/run-llama/llama_index/issues/10003) - [[Question]: how to qurey based on particle embedding in database ](https://github.com/run-llama/llama_index/issues/7707) - [[Question]: Explainability of Vector Search](https://github.com/run-llama/llama_index/issues/7056) - [[Multimodal] General questions](https://github.com/run-llama/llama_index/issues/807) - [How to use llama_index to summarize information from multiple articles](https://github.com/run-llama/llama_index/issues/753) - [llama-index-core/llama_index/core/query_engine/knowledge_graph_query_engine.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/knowledge_graph_query_engine.py) - [llama-index-networks/examples/demo/contributor-1/contributor_1/app_query_engine.py](https://github.com/run-llama/llama_index/blob/main/llama-index-networks/examples/demo/contributor-1/contributor_1/app_query_engine.py) - [llama-index-packs/llama-index-packs-dense-x-retrieval/examples/example.py](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-dense-x-retrieval/examples/example.py) - [llama-index-core/llama_index/core/query_engine/citation_query_engine.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/citation_query_engine.py) - [llama-index-core/llama_index/core/indices/__init__.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/__init__.py) - [llama-index-legacy/llama_index/legacy/query_engine/multi_modal.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/query_engine/multi_modal.py) - [docs/docs/examples/multi_modal/image_to_image_retrieval.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/multi_modal/image_to_image_retrieval.ipynb) - [llama-index-core/llama_index/core/indices/struct_store/sql_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/struct_store/sql_retriever.py) - [llama-index-legacy/llama_index/legacy/indices/tree/select_leaf_embedding_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/tree/select_leaf_embedding_retriever.py) - [docs/docs/module_guides/models/multi_modal.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/module_guides/models/multi_modal.md) - [docs/docs/use_cases/multimodal.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/use_cases/multimodal.md) - [llama-index-core/llama_index/core/indices/multi_modal/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/multi_modal/base.py) - [llama-index-core/llama_index/core/indices/multi_modal/retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/multi_modal/retriever.py) - [docs/docs/examples/multi_modal/ollama_cookbook.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/multi_modal/ollama_cookbook.ipynb) - [llama-index-integrations/embeddings/llama-index-embeddings-gradient/llama_index/embeddings/gradient/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-gradient/llama_index/embeddings/gradient/base.py) - [llama-index-integrations/embeddings/llama-index-embeddings-google/llama_index/embeddings/google/__init__.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-google/llama_index/embeddings/google/__init__.py) - [llama-index-core/tests/embeddings/test_base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/tests/embeddings/test_base.py) - [llama-index-legacy/llama_index/legacy/retrievers/__init__.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/retrievers/__init__.py) - [docs/docs/examples/finetuning/embeddings/finetune_embedding_adapter.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/finetuning/embeddings/finetune_embedding_adapter.ipynb) - [docs/docs/use_cases/fine_tuning.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/use_cases/fine_tuning.md) - [llama-index-finetuning/llama_index/finetuning/__init__.py](https://github.com/run-llama/llama_index/blob/main/llama-index-finetuning/llama_index/finetuning/__init__.py) - [docs/docs/optimizing/fine-tuning/fine-tuning.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/optimizing/fine-tuning/fine-tuning.md) - [llama-index-integrations/embeddings/llama-index-embeddings-huggingface/llama_index/embeddings/huggingface/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-huggingface/llama_index/embeddings/huggingface/base.py) - [llama-index-core/llama_index/core/query_engine/multi_modal.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/multi_modal.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

logan-markewich commented 1 month ago

Basically it's 2. Create indexes for each type. The multimodal index is one specific use case of these, for images and text, but the general approach makes sense, since each modality requires a specific embedding model

coffepowered commented 1 month ago

the response from dosubot is useful, but nothing beats a human. Thank you @logan-markewich !

run-llama / llama_index

[Question]: Approaches to searching documents requiring different kind of embeddings (multimodal too) #13594

Question Validation

Question