[Question]: Getting corresponding retrieved document information (metadata, text...) from a KnowledgeGraphRAGRetriever Query

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

Hi, I am wondering whether there is a way to get corresponding retrieved document (with metadata, text) when querying with KnowledgeGraphRAGRetriever? Since I check output response of query, it only contains a list of relevant triplets.

To obtain the full document details, including metadata and text, along with the relevant triplets when making a query using the KnowledgeGraphRAGRetriever, you need to ensure that the include_text parameter is set to True. This will include the document text source from each relevant triplet during queries.

Here is the relevant code snippet from the KnowledgeGraphRAGRetriever class that shows how to include the document text:

from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    verbose=True,
    include_text=True,  # Ensure this is set to True
)

Additionally, you can use the RetrieverQueryEngine to perform the query and obtain the full document details:

from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

response = query_engine.query(
    "Tell me about Peter Quill?",
)

By setting include_text to True, the retriever will include the full document details, including metadata and text, along with the relevant triplets in the response ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

@dosu I have already used include_text=True in first place, but the retriever did not include full document details. Please give more detailed how to get it from retriever.

To get the full document details, including metadata and text, from the KnowledgeGraphRAGRetriever, you need to ensure that the retriever is correctly configured and that the retrieval process is properly handled. Here are the detailed steps:

Ensure include_text=True: This parameter must be set to True to include the document text in the retrieval process.
Use the _retrieve Method: The _retrieve method in the KnowledgeGraphRAGRetriever class is responsible for retrieving nodes with their details. Ensure this method is correctly implemented and called.
Check the Test Cases: Refer to the test cases that demonstrate the use of include_text=True to ensure your setup is correct.

Here is a detailed example:

Step-by-Step Example

Initialize the Retriever:

from llama_index.core.retrievers import KnowledgeGraphRAGRetriever
from llama_index.core.storage import StorageContext

# Initialize storage context with a graph store
storage_context = StorageContext.from_defaults(graph_store=your_graph_store)

# Initialize the retriever with include_text=True
graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    include_text=True,
    verbose=True,
)

Use the Retriever with a Query Engine:

from llama_index.core.query_engine import RetrieverQueryEngine

# Initialize the query engine with the retriever
query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

# Perform a query
response = query_engine.query(
    "Tell me about Peter Quill?",
)

# Display the response
from IPython.display import display, Markdown
display(Markdown(f"<b>{response}</b>"))

Verify the Retrieval in Test Cases: Check the test cases to ensure your setup aligns with the expected behavior. For example, the test_as_retriever and test_retrievers test cases demonstrate the retrieval of full document details with include_text=True:

@patch.object(
    KnowledgeGraphIndex, "_extract_triplets", side_effect=mock_extract_triplets
)
def test_as_retriever(
    _patch_extract_triplets: Any,
    documents: List[Document],
    mock_service_context: ServiceContext,
) -> None:
    graph_store = SimpleGraphStore()
    storage_context = StorageContext.from_defaults(graph_store=graph_store)
    index = KnowledgeGraphIndex.from_documents(
        documents, service_context=mock_service_context, storage_context=storage_context
    )
    retriever = index.as_retriever()
    nodes = retriever.retrieve(QueryBundle("foo"))
    assert len(nodes) == 2
    assert nodes[1].node.get_content() == "Expected content with metadata and text"

By following these steps and ensuring your setup matches the examples, you should be able to retrieve full document details, including metadata and text, using the KnowledgeGraphRAGRetriever ^[1]^[2].

To continue talking to Dosu, mention @dosu.

@dosu My nodes from nodes = retriever.retrieve(QueryBundle("foo")) has only 1 item. And the output of nodes[0].node.get_content() is as below:

The following are knowledge sequence in max depth 2 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...` extracted based on key entities as subject:
Login{name: Login} <-[relationship:{relationship: Requires}]- Application{name: Application} -[relationship:{relationship: Needs increase in}]-> Font size{name: Font size}
Login{name: Login} <-[relationship:{relationship: Requires}]- Application{name: Application} -[relationship:{relationship: Displays}]

As you can see, the retrieved output contains only triplets, there is no document metadata and text in it. Is there something wrong here?

To ensure that the full document details, including metadata and text, are included in the retrieval when using the KnowledgeGraphRAGRetriever, you need to customize the retrieved information using the retrieval_query parameter. The retrieval query must return the following four columns:

text: The text of the returned document
score: Similarity score
id: Node id
metadata: Dictionary with additional metadata (must contain _node_type and _node_content keys)

Here is an example of how to set up the retrieval_query:

retrieval_query = (
    "RETURN 'Interleaf hired Tomaz' AS text, score, node.id AS id, "
    "{author: 'Tomaz', _node_type:node._node_type, _node_content:node._node_content} AS metadata"
)
neo4j_vector_retrieval = Neo4jVectorStore(
    username, password, url, embed_dim, retrieval_query=retrieval_query
)

loaded_index = VectorStoreIndex.from_vector_store(
    neo4j_vector_retrieval
).as_query_engine()
response = loaded_index.query("What happened at interleaf?")
display(Markdown(f"<b>{response}</b>"))

Additionally, when initializing the KnowledgeGraphRAGRetriever, ensure that the include_text parameter is set to True. This parameter ensures that the document text source from each relevant triplet is included during queries.

Here is an example of how to initialize the KnowledgeGraphRAGRetriever:

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=my_storage_context,
    llm=my_llm,
    entity_extract_fn=my_entity_extract_fn,
    entity_extract_template=my_entity_extract_template,
    entity_extract_policy="union",
    synonym_expand_fn=my_synonym_expand_fn,
    synonym_expand_template=my_synonym_expand_template,
    synonym_expand_policy="union",
    max_entities=5,
    max_synonyms=5,
    retriever_mode="keyword",  # or "embedding" or "hybrid"
    with_nl2graphquery=False,
    graph_traversal_depth=2,
    max_knowledge_sequence=30,
    verbose=True,
    include_text=True
)

query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

This setup ensures that the full document details, including metadata and text, are included in the retrieval ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

@dosu I do not load Index from Neo4jVectorStore. I use vector_index = VectorStoreIndex.from_documents(document) instead. Also, I use Nebulagraph instead of Neo4j. Please give a suitable example.

To include the full document details, including metadata and text, when querying with KnowledgeGraphRAGRetriever using NebulaGraph, you need to ensure that your setup is correctly configured. Here is a suitable example for your use case:

Initialize the NebulaGraphStore:

from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index.core import StorageContext

# Define your storage context with NebulaGraphStore
graph_store = NebulaGraphStore(
    space_name="llamaindex",
    edge_types=["relationship"],
    rel_prop_names=["relationship"],
    tags=["entity"],
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

Initialize the KnowledgeGraphRAGRetriever:

from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    include_text=True,  # Ensure text is included in the response
    verbose=True,
)

Use the Retriever with a Query Engine:

from llama_index.core.query_engine import RetrieverQueryEngine
from IPython.display import display, Markdown

# Create the query engine
query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

# Perform the query
response = query_engine.query(
    "Tell me about Peter Quill?",
)
display(Markdown(f"<b>{response}</b>"))

This setup ensures that the retrieval query returns the text, score, node id, and metadata, which includes _node_type and _node_content keys. The include_text=True parameter in the KnowledgeGraphRAGRetriever ensures that the document text is included in the response ^[1]^[2]^[3]^[4].

Additionally, you can customize the NebulaGraphStore with various parameters to influence the retrieval of full document details, such as edge_types, rel_prop_names, tags, tag_prop_names, include_vid, session_pool_kwargs, sanitize_query_output, and enhanced_schema ^[5]^[6]^[7].

To continue talking to Dosu, mention @dosu.

To ensure that the specific metadata and text content of the retrieved documents are included in the query response when using KnowledgeGraphRAGRetriever, you need to set the include_text parameter to True during the initialization of the retriever. Additionally, you can customize the retrieved information using the retrieval_query parameter. Here is a detailed example:

Initialize the KnowledgeGraphIndex with Documents:

from llama_index.core.indices.knowledge_graph import KnowledgeGraphIndex
from llama_index.core import StorageContext

# Initialize storage context with your graph store
storage_context = StorageContext.from_defaults(graph_store=your_graph_store)

# Initialize the KnowledgeGraphIndex with documents
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=2,
    space_name="your_space_name",
    edge_types=["relationship"],
    rel_prop_names=["relationship"],
    tags=["entity"],
)

Customize the Retrieval Query:

retrieval_query = (
    "RETURN 'Interleaf hired Tomaz' AS text, score, node.id AS id, "
    "{author: 'Tomaz', _node_type:node._node_type, _node_content:node._node_content} AS metadata"
)

Initialize the KnowledgeGraphRAGRetriever:

from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    include_text=True,  # Ensure text content is included
    verbose=True,
)

Use the Retriever with a Query Engine:

from llama_index.core.query_engine import RetrieverQueryEngine
from IPython.display import display, Markdown

# Create the query engine
query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

# Perform the query
response = query_engine.query(
    "Tell me about Peter Quill?",
)
display(Markdown(f"<b>{response}</b>"))

By setting include_text to True and defining a retrieval_query that includes the necessary columns (text, score, id, and metadata), the retriever will include the text content and specific metadata of the relevant documents in the query response ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

run-llama / llama_index