run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.11k stars 5.14k forks source link

[Question]: Can the FAISS index speed up the retrieval of a specified retriever (Like BM25)? Why using as_query_engine is much more faster than using a specified retriever? #13054

Closed TianyuFan0504 closed 5 months ago

TianyuFan0504 commented 5 months ago

Question Validation

Question

I have two indices, one (index1) is stored in FAISS index, and the other (index2) is stored in naive Vector Store Index:

documents = SimpleDirectoryReader("../datasets").load_data()
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index1 = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

nodes = Settings.node_parser.get_nodes_from_documents(documents,show_progress = True)
index2 = VectorStoreIndex(nodes,include_embeddings=True)

However, when I use BM25 retriever on them, the time they consumed is comparable:

retriever = BM25Retriever.from_defaults(
    index=index1,
    similarity_top_k=10,
    verbose= True,
)
query = "What is the name of the archetype of the god worshipped in the city of Amathus on Cyprus?"
nodes_retrieved1 = retriever.retrieve(query)#Consume 10m27s

retriever = BM25Retriever.from_defaults(
    index=index2,
    similarity_top_k=10,
    verbose= True,
)
query = "What is the name of the archetype of the god worshipped in the city of Amathus on Cyprus?"
nodes_retrieved2 = retriever.retrieve(query)#Consume 10m20s

But if I use index.as_query_engine(), the speed is much more faster:

query_engine = index1.as_query_engine()
response1 = query_engine.query(query)#Consume 3.7s

In my understanding, as_query_engine is actually the wrapped default retriever. Why is there such a big difference in speed between the two of them?

Can I use the specified retriever+faiss index to achieve the same speed as using as_query_engine?

dosubot[bot] commented 5 months ago

The observed speed difference between using index.as_query_engine() and a specified retriever like BM25, even on a FAISS index, stems from the inherent operational and optimization differences between these methods.

  1. Direct Access vs. Algorithmic Retrieval: index.as_query_engine() likely leverages FAISS's built-in capabilities for efficient vector similarity search more directly. FAISS is optimized for fast nearest neighbor searches in large datasets, benefiting from optimized algorithms and data structures tailored for vector search operations.

  2. Optimization and Overhead: FAISS indexes are specifically optimized for vector space operations. The direct interaction through index.as_query_engine() can fully utilize FAISS's optimizations, reducing overhead. On the other hand, BM25, being a text retrieval function, involves additional computational steps and potentially less direct integration with FAISS, leading to slower retrieval times.

  3. Hardware Acceleration: If index.as_query_engine() is configured to use FAISS's GPU acceleration capabilities, while the BM25 retriever setup does not leverage the same level of hardware acceleration, this could significantly impact speed.

To approach the speed of index.as_query_engine() with a specified retriever like BM25 combined with a FAISS index, it would require ensuring that the retriever is closely integrated with FAISS's optimizations, minimizing additional computational overhead, and leveraging hardware acceleration to the fullest extent possible. However, due to the fundamental differences in their operational approaches—BM25 being term frequency-based and FAISS focusing on vector similarity—it might be challenging to achieve identical speeds without significant customization and optimization efforts.

For a more sophisticated retrieval system that combines the strengths of both text and vector-based retrieval, you might consider a hybrid approach as demonstrated in the LlamaIndex documentation. This involves using both BM25 for text-based retrieval and vector search for similarity-based retrieval, potentially offering a balanced solution that leverages the advantages of both methods. More details on this approach can be found in the FaissIndexDemo.ipynb and bm25_retriever.ipynb notebooks.

#### Details **Sources** - [docs/docs/examples/vector_stores/FaissIndexDemo.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/FaissIndexDemo.ipynb) - [docs/docs/examples/retrievers/bm25_retriever.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/retrievers/bm25_retriever.ipynb) - [llama-index-integrations/retrievers/llama-index-retrievers-bm25/llama_index/retrievers/bm25/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/retrievers/llama-index-retrievers-bm25/llama_index/retrievers/bm25/base.py) - [llama-index-legacy/llama_index/legacy/llms/nvidia_tensorrt.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/llms/nvidia_tensorrt.py) - [llama-index-legacy/llama_index/legacy/retrievers/bm25_retriever.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/retrievers/bm25_retriever.py) - [llama-index-packs/llama-index-packs-fusion-retriever/llama_index/packs/fusion_retriever/hybrid_fusion/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-fusion-retriever/llama_index/packs/fusion_retriever/hybrid_fusion/base.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

logan-markewich commented 5 months ago

🤷🏻 These aren't exactly comparable

BM25 is not using your vector store. It's just loading all your nodes into memory, calculating the sparse embeddings with BM25s algorithm, and then running some pairwise comparison on query against all your nodes (this is using the rank_bm25 library under the hood).

Whereas as_query_engine() will be doing both a retrieval and at least 1 LLM call. (if you just want the retrieval time, you should use index.as_retriever().retrieve(".."). And FAISS is highly optimized for retrieval, and will easily be faster than BM25 for large datasets

TianyuFan0504 commented 5 months ago

🤷🏻 These aren't exactly comparable

BM25 is not using your vector store. It's just loading all your nodes into memory, calculating the sparse embeddings with BM25s algorithm, and then running some pairwise comparison on query against all your nodes (this is using the rank_bm25 library under the hood).

Whereas as_query_engine() will be doing both a retrieval and at least 1 LLM call. (if you just want the retrieval time, you should use index.as_retriever().retrieve(".."). And FAISS is highly optimized for retrieval, and will easily be faster than BM25 for large datasets

Thanks @logan-markewich. I'm new to llamaindex and IR, so huge thanks for your answer : )