Closed chengyjonathan closed 2 months ago
Hey @chengyjonathan! Great to see you venturing into new territories again 🚀
Based on the information provided in the LlamaIndex codebase, it appears that the framework already supports a hybrid search mechanism involving lexical, semantic, and knowledge graph retrieval. This is primarily facilitated through the KGTableRetriever
class, where the retriever_mode
parameter can be set to KGRetrieverMode.HYBRID
, indicating a hybrid approach combining both keywords and embeddings to find relevant triplets in a knowledge graph.
The hybrid search functionality is implemented in the _retrieve
method of the KGTableRetriever
class. This method uses both lexical search (via keywords) and semantic search (via embeddings) to retrieve relevant nodes from the knowledge graph. If the retriever_mode
is set to HYBRID
, the method combines results from both lexical and semantic searches to enhance the retrieval process.
Here's a brief overview of how the hybrid search works:
Keyword Extraction: The method starts by extracting keywords from the query string using the _get_keywords
method, which leverages a language model for keyword extraction.
Lexical Search: For each extracted keyword, it searches the knowledge graph for relevant nodes using the keyword search functionality (self._index_struct.search_node_by_keyword
). This represents the lexical aspect of the search.
Semantic Search: If embeddings are used (self._retriever_mode != KGRetrieverMode.KEYWORD
), the method computes the query embedding using the provided embedding model (self._embed_model.get_text_embedding
). It then retrieves relevant text snippets based on their semantic similarity to the query embedding, using the embeddings stored in self._index_struct.embedding_dict
.
Combining Results: If the retriever_mode
is HYBRID
, the method combines nodes found through both lexical and semantic searches. It ensures uniqueness of the results and prioritizes them based on relevance.
Final Node Retrieval: The method retrieves the final set of nodes based on the combined results from both searches, along with their associated text snippets if include_text
is set to True
.
You can find this implementation in the llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py file.
I hope this helps! If you have any further questions or need more clarification, feel free to ask.
I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix
https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion
I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix
https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion
Forgive me if this is wrong, but would I need two separate indexes? One composed of kg triplets?
Or could I just mix bm25, embedding based, and a kg based retriever?
@chengyjonathan 3 retrievers, that then get composed into one. How you get those retrievers is up to you 👍🏻
I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix
https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion
@logan-markewich
If I had a KnowledgeGraphRAGRetriever and a VectorIndexAutoRetriever and wanted to fuse them - would the SQLAutoVectoryQueryEngine framework but for Knowledge Graphs be the correct approach over this Fusion retriever?
https://docs.llamaindex.ai/en/stable/examples/query_engine/SQLAutoVectorQueryEngine/
Question Validation
Question
Hi there!
I'm wondering where would be the best place to get started if I wanted to implement a lexical, semantic, graph search hybrid.
I see the examples for doing lexical + semantic retrieval.
And I was wondering if there are any guides for doing semantic + kg retrieval.