run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.29k stars 4.66k forks source link

[Question]: What is the best to retrieve Knowledge graphs embeddings #13129

Open navneteberi30 opened 2 months ago

navneteberi30 commented 2 months ago

Question Validation

Question

Hi I have general question, How can we retrieve the knowledge graph embeddings from the graph store instead of using the local storage context? I am unable to find any documentation related to that or is it something we can expect in future current process looks something like this once we create the knowledge graphs and then use local storage context. Could you please share how can we retrieve from the graph store that we build kg_index = load_index_from_storage( storage_context=storage_context, max_triplets_per_chunk=10, space_name=space_name, edge_types=edge_types, rel_prop_names=rel_prop_names, tags=tags, verbose=True, )

dosubot[bot] commented 2 months ago

To retrieve knowledge graph embeddings directly from a graph store, you can use a specialized approach involving the KGTableRetriever class within the LlamaIndex framework. This class is designed to interact with your graph store and supports the retrieval of embeddings by leveraging the graph store's capabilities. Here's a concise guide on setting this up:

  1. Implement the GraphStore Protocol: Define a class that implements the GraphStore protocol from LlamaIndex, ensuring it meets the requirements of your specific graph store.

  2. Initialize the KnowledgeGraphIndex: Create an instance of KnowledgeGraphIndex by providing it with a ServiceContext and your graph store instance.

  3. Set Up KGTableRetriever: Initialize KGTableRetriever with the knowledge graph index, specifying KGRetrieverMode.EMBEDDING to indicate you're retrieving embeddings. Configure additional parameters like similarity_top_k and graph_store_query_depth as needed.

  4. Retrieve Embeddings: Use the _retrieve method of KGTableRetriever with a QueryBundle to fetch the embeddings from your graph store.

Here's a simplified example to illustrate the setup:

from llama_index.core.indices.knowledge_graph.base import KnowledgeGraphIndex
from llama_index.core.service_context import ServiceContext
from llama_index.core.indices.knowledge_graph.retrievers import KGTableRetriever, KGRetrieverMode
from llama_index.core.schema import QueryBundle
from llama_index.core.graph_stores.types import GraphStore

class YourGraphStore(GraphStore):
    # Implement required methods for your graph store

your_graph_store = YourGraphStore()
service_context = ServiceContext()
knowledge_graph_index = KnowledgeGraphIndex(service_context=service_context, graph_store=your_graph_store)
kg_table_retriever = KGTableRetriever(index=knowledge_graph_index, retriever_mode=KGRetrieverMode.EMBEDDING)

query_bundle = QueryBundle(query_str="Your query here")
nodes_with_embeddings = kg_table_retriever._retrieve(query_bundle)

This setup allows you to directly interact with your graph store to retrieve embeddings, bypassing the need for local storage contexts. Ensure your YourGraphStore class correctly implements the GraphStore protocol, and adjust the ServiceContext and QueryBundle as per your application's specifics.

If you encounter any specific issues or need further clarification on implementing this solution, feel free to ask.

#### Details **Sources** - [[Question]: How does KnowledgeGraphIndex retrieve nodes? Does it rely on LLM?](https://github.com/run-llama/llama_index/issues/12065) - [[Question]: How to get vector from Node without checking the databases?](https://github.com/run-llama/llama_index/issues/10919) - [llama-index-legacy/llama_index/legacy/indices/knowledge_graph/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/knowledge_graph/retrievers.py) - [llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py) - [llama-index-core/llama_index/core/graph_stores/types.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/graph_stores/types.py) - [llama-index-legacy/llama_index/legacy/graph_stores/types.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/graph_stores/types.py) - [llama-index-integrations/embeddings/llama-index-embeddings-anyscale/llama_index/embeddings/anyscale/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-anyscale/llama_index/embeddings/anyscale/base.py) - [llama-index-integrations/embeddings/llama-index-embeddings-gradient/llama_index/embeddings/gradient/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-gradient/llama_index/embeddings/gradient/base.py) - [llama-index-integrations/embeddings/llama-index-embeddings-huggingface-optimum-intel/llama_index/embeddings/huggingface_optimum_intel/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-huggingface-optimum-intel/llama_index/embeddings/huggingface_optimum_intel/base.py) - [llama-index-core/llama_index/core/indices/loading.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/loading.py) - [llama-index-core/llama_index/core/schema.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/schema.py) - [llama-index-core/llama_index/core/service_context.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/service_context.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

oykunehir commented 2 months ago

I have the same problem, as you mentioned in your code KGTableRetriever takes knowledge_graph_index as parameter. Which means every time we need to re-build the KG index. Is there a way to build the KG indexes and store in a graph database, then query without building the index again?

navneteberi30 commented 2 months ago

@dosu The code you provided when we will call KGTableRetriever it needs index=knowledge_graph_index so every time we call the kg_table_retriever it will recreate the knowledge_graph_index which we want to avoid.

knowledge_graph_index = KnowledgeGraphIndex(service_context=service_context, graph_store=your_graph_store) kg_table_retriever = KGTableRetriever(index=knowledge_graph_index, retriever_mode=KGRetrieverMode.EMBEDDING)

dosubot[bot] commented 2 months ago

To address the concern about rebuilding the KnowledgeGraphIndex every time with KGTableRetriever, it's important to clarify that once the KnowledgeGraphIndex is initialized and passed to KGTableRetriever, it does not get recreated with each retrieval operation. The index is used as is for fetching data, which means your application can maintain the index state across different retrieval calls without needing to rebuild it.

For scenarios where you're looking to persist the knowledge graph index and avoid rebuilding it on every application start, you can leverage the persistence capabilities of the FaissVectorStore within the LlamaIndex framework. This involves saving the index to a file after its initial creation and loading it from this file when needed. Here's how you can do it:

  1. Save the Index: After creating your index, use the persist method of the FaissVectorStore to save it.
    faiss_store.persist(persist_path="/path/to/save/index")
  2. Load the Index: When you need to use the index again, load it using the from_persist_path method.
    faiss_store = FaissVectorStore.from_persist_path(persist_path="/path/to/index/file")

This approach allows you to maintain and reuse the knowledge graph index without the overhead of rebuilding it, enhancing the efficiency of your application. Remember to replace "/path/to/index/file" and "/path/to/save/index" with the actual paths relevant to your environment.

This method ensures that your application can efficiently query the knowledge graph by leveraging the stored index, significantly reducing the initialization time and computational overhead associated with rebuilding the index from scratch.

#### Details **Sources** - [[Question]: About Graph persist](https://github.com/run-llama/llama_index/issues/8621) - [[Question]: How do I load an already existing index created using NebulaGraphStore for querying instead of running KnowledgeGraphIndex.from_documents every time?](https://github.com/run-llama/llama_index/issues/6952) - [Is there any default way to save a composable graph index?](https://github.com/run-llama/llama_index/issues/2795) - [[Question]: How do you change the LLM of an existing index or query engine?](https://github.com/run-llama/llama_index/issues/7558) - [llama-index-core/llama_index/core/indices/knowledge_graph/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/knowledge_graph/base.py) - [docs/docs/examples/index_structs/knowledge_graph/Neo4jKGIndexDemo.ipynb](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/index_structs/knowledge_graph/Neo4jKGIndexDemo.ipynb) - [llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py) - [llama-index-core/llama_index/core/graph_stores/types.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/graph_stores/types.py) - [llama-index-integrations/vector_stores/llama-index-vector-stores-faiss/llama_index/vector_stores/faiss/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-faiss/llama_index/vector_stores/faiss/base.py) - [llama-index-core/llama_index/core/indices/loading.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/loading.py) - [llama-index-legacy/llama_index/legacy/indices/loading.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/loading.py) - [llama-index-legacy/llama_index/legacy/indices/vector_store/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/vector_store/base.py) - [llama-index-core/llama_index/core/indices/vector_store/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/vector_store/base.py) - [llama-index-core/llama_index/core/service_context.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/service_context.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

gich2009 commented 2 months ago

Hi @navneteberi30 @oykunehir, have you guys tried using the KnowledgeGraphRAGRetriever()? You can build your index with the KnowledgeGraphIndex() and then pass in the graph_store you used for this process to the KnowledgeGraphRAGRetriever(). Let me know if this works for you.

navneteberi30 commented 2 months ago

@gich2009 that doesn't work because once i build the KnowledgeGraphIndex.from_documents() and unable to get any reposnse

kg_index = KnowledgeGraphIndex.from_documents(
        documents=github_document_loader(),
        llm=OpenAI(model="gpt-4", temperature=0.1),
        graph_store=graph_store,
        max_triplets_per_chunk=10,
        space_name=SPACE_NAME,
        edge_types=EDGE_TYPES,
        rel_prop_names=REL_PROP_NAMES,
        tags=TAGS,
        include_embeddings=True,
    )

 storage_context = StorageContext.from_defaults(graph_store=graph_store)
    graph_rag_retriever = KnowledgeGraphRAGRetriever(
        storage_context=storage_context,
        verbose=True,
    )
gich2009 commented 2 months ago

@navneteberi30, let me give it a try on my side then I'll get back to you on it.

logan-markewich commented 1 month ago

You think can just do KnowledgeGraphIndex(nodes=[], storage_context=storage_context) or just skip the index, and throw the graph store into the KnowledgeGraphRAGRetriever

hungbui-io commented 1 week ago

@logan-markewich Hi, is there any way to get the metadata, text, embedding of original documents that are retrieved when querying with KnowledgeGraphRAGRetriever? The original documents are ones that were used to generate KG with kg_index=KnowledgeGraphIndex.from_documents(documents)