run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.84k stars 5.08k forks source link

[Question]: Getting corresponding retrieved document information (metadata, text...) from a KnowledgeGraphRAGRetriever Query #14399

Closed hungbui-io closed 6 hours ago

hungbui-io commented 3 months ago

Question Validation

Question

Hi, I am wondering whether there is a way to get corresponding retrieved document (with metadata, text) when querying with KnowledgeGraphRAGRetriever? Since I check output response of query, it only contains a list of relevant triplets.

dosubot[bot] commented 3 months ago

To obtain the full document details, including metadata and text, along with the relevant triplets when making a query using the KnowledgeGraphRAGRetriever, you need to ensure that the include_text parameter is set to True. This will include the document text source from each relevant triplet during queries.

Here is the relevant code snippet from the KnowledgeGraphRAGRetriever class that shows how to include the document text:

from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    verbose=True,
    include_text=True,  # Ensure this is set to True
)

Additionally, you can use the RetrieverQueryEngine to perform the query and obtain the full document details:

from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

response = query_engine.query(
    "Tell me about Peter Quill?",
)

By setting include_text to True, the retriever will include the full document details, including metadata and text, along with the relevant triplets in the response [1][2][3].

To continue talking to Dosu, mention @dosu.

hungbui-io commented 3 months ago

@dosu I have already used include_text=True in first place, but the retriever did not include full document details. Please give more detailed how to get it from retriever.

dosubot[bot] commented 3 months ago

To get the full document details, including metadata and text, from the KnowledgeGraphRAGRetriever, you need to ensure that the retriever is correctly configured and that the retrieval process is properly handled. Here are the detailed steps:

  1. Ensure include_text=True: This parameter must be set to True to include the document text in the retrieval process.

  2. Use the _retrieve Method: The _retrieve method in the KnowledgeGraphRAGRetriever class is responsible for retrieving nodes with their details. Ensure this method is correctly implemented and called.

  3. Check the Test Cases: Refer to the test cases that demonstrate the use of include_text=True to ensure your setup is correct.

Here is a detailed example:

Step-by-Step Example

  1. Initialize the Retriever:

    from llama_index.core.retrievers import KnowledgeGraphRAGRetriever
    from llama_index.core.storage import StorageContext
    
    # Initialize storage context with a graph store
    storage_context = StorageContext.from_defaults(graph_store=your_graph_store)
    
    # Initialize the retriever with include_text=True
    graph_rag_retriever = KnowledgeGraphRAGRetriever(
        storage_context=storage_context,
        include_text=True,
        verbose=True,
    )
  2. Use the Retriever with a Query Engine:

    from llama_index.core.query_engine import RetrieverQueryEngine
    
    # Initialize the query engine with the retriever
    query_engine = RetrieverQueryEngine.from_args(
        graph_rag_retriever,
    )
    
    # Perform a query
    response = query_engine.query(
        "Tell me about Peter Quill?",
    )
    
    # Display the response
    from IPython.display import display, Markdown
    display(Markdown(f"<b>{response}</b>"))
  3. Verify the Retrieval in Test Cases: Check the test cases to ensure your setup aligns with the expected behavior. For example, the test_as_retriever and test_retrievers test cases demonstrate the retrieval of full document details with include_text=True:

    @patch.object(
        KnowledgeGraphIndex, "_extract_triplets", side_effect=mock_extract_triplets
    )
    def test_as_retriever(
        _patch_extract_triplets: Any,
        documents: List[Document],
        mock_service_context: ServiceContext,
    ) -> None:
        graph_store = SimpleGraphStore()
        storage_context = StorageContext.from_defaults(graph_store=graph_store)
        index = KnowledgeGraphIndex.from_documents(
            documents, service_context=mock_service_context, storage_context=storage_context
        )
        retriever = index.as_retriever()
        nodes = retriever.retrieve(QueryBundle("foo"))
        assert len(nodes) == 2
        assert nodes[1].node.get_content() == "Expected content with metadata and text"

By following these steps and ensuring your setup matches the examples, you should be able to retrieve full document details, including metadata and text, using the KnowledgeGraphRAGRetriever [1][2].

To continue talking to Dosu, mention @dosu.

hungbui-io commented 3 months ago

@dosu My nodes from nodes = retriever.retrieve(QueryBundle("foo")) has only 1 item. And the output of nodes[0].node.get_content() is as below:

The following are knowledge sequence in max depth 2 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...` extracted based on key entities as subject:
Login{name: Login} <-[relationship:{relationship: Requires}]- Application{name: Application} -[relationship:{relationship: Needs increase in}]-> Font size{name: Font size}
Login{name: Login} <-[relationship:{relationship: Requires}]- Application{name: Application} -[relationship:{relationship: Displays}]

As you can see, the retrieved output contains only triplets, there is no document metadata and text in it. Is there something wrong here?

dosubot[bot] commented 3 months ago

To ensure that the full document details, including metadata and text, are included in the retrieval when using the KnowledgeGraphRAGRetriever, you need to customize the retrieved information using the retrieval_query parameter. The retrieval query must return the following four columns:

Here is an example of how to set up the retrieval_query:

retrieval_query = (
    "RETURN 'Interleaf hired Tomaz' AS text, score, node.id AS id, "
    "{author: 'Tomaz', _node_type:node._node_type, _node_content:node._node_content} AS metadata"
)
neo4j_vector_retrieval = Neo4jVectorStore(
    username, password, url, embed_dim, retrieval_query=retrieval_query
)

loaded_index = VectorStoreIndex.from_vector_store(
    neo4j_vector_retrieval
).as_query_engine()
response = loaded_index.query("What happened at interleaf?")
display(Markdown(f"<b>{response}</b>"))

Additionally, when initializing the KnowledgeGraphRAGRetriever, ensure that the include_text parameter is set to True. This parameter ensures that the document text source from each relevant triplet is included during queries.

Here is an example of how to initialize the KnowledgeGraphRAGRetriever:

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=my_storage_context,
    llm=my_llm,
    entity_extract_fn=my_entity_extract_fn,
    entity_extract_template=my_entity_extract_template,
    entity_extract_policy="union",
    synonym_expand_fn=my_synonym_expand_fn,
    synonym_expand_template=my_synonym_expand_template,
    synonym_expand_policy="union",
    max_entities=5,
    max_synonyms=5,
    retriever_mode="keyword",  # or "embedding" or "hybrid"
    with_nl2graphquery=False,
    graph_traversal_depth=2,
    max_knowledge_sequence=30,
    verbose=True,
    include_text=True
)

query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

This setup ensures that the full document details, including metadata and text, are included in the retrieval [1][2][3][4].

To continue talking to Dosu, mention @dosu.

hungbui-io commented 3 months ago

@dosu I do not load Index from Neo4jVectorStore. I use vector_index = VectorStoreIndex.from_documents(document) instead. Also, I use Nebulagraph instead of Neo4j. Please give a suitable example.

dosubot[bot] commented 3 months ago

To include the full document details, including metadata and text, when querying with KnowledgeGraphRAGRetriever using NebulaGraph, you need to ensure that your setup is correctly configured. Here is a suitable example for your use case:

  1. Initialize the NebulaGraphStore:

    from llama_index.graph_stores.nebula import NebulaGraphStore
    from llama_index.core import StorageContext
    
    # Define your storage context with NebulaGraphStore
    graph_store = NebulaGraphStore(
        space_name="llamaindex",
        edge_types=["relationship"],
        rel_prop_names=["relationship"],
        tags=["entity"],
    )
    storage_context = StorageContext.from_defaults(graph_store=graph_store)
  2. Initialize the KnowledgeGraphRAGRetriever:

    from llama_index.core.retrievers import KnowledgeGraphRAGRetriever
    
    graph_rag_retriever = KnowledgeGraphRAGRetriever(
        storage_context=storage_context,
        include_text=True,  # Ensure text is included in the response
        verbose=True,
    )
  3. Use the Retriever with a Query Engine:

    from llama_index.core.query_engine import RetrieverQueryEngine
    from IPython.display import display, Markdown
    
    # Create the query engine
    query_engine = RetrieverQueryEngine.from_args(
        graph_rag_retriever,
    )
    
    # Perform the query
    response = query_engine.query(
        "Tell me about Peter Quill?",
    )
    display(Markdown(f"<b>{response}</b>"))

This setup ensures that the retrieval query returns the text, score, node id, and metadata, which includes _node_type and _node_content keys. The include_text=True parameter in the KnowledgeGraphRAGRetriever ensures that the document text is included in the response [1][2][3][4].

Additionally, you can customize the NebulaGraphStore with various parameters to influence the retrieval of full document details, such as edge_types, rel_prop_names, tags, tag_prop_names, include_vid, session_pool_kwargs, sanitize_query_output, and enhanced_schema [5][6][7].

To continue talking to Dosu, mention @dosu.

dosubot[bot] commented 3 months ago

To ensure that the specific metadata and text content of the retrieved documents are included in the query response when using KnowledgeGraphRAGRetriever, you need to set the include_text parameter to True during the initialization of the retriever. Additionally, you can customize the retrieved information using the retrieval_query parameter. Here is a detailed example:

  1. Initialize the KnowledgeGraphIndex with Documents:

    from llama_index.core.indices.knowledge_graph import KnowledgeGraphIndex
    from llama_index.core import StorageContext
    
    # Initialize storage context with your graph store
    storage_context = StorageContext.from_defaults(graph_store=your_graph_store)
    
    # Initialize the KnowledgeGraphIndex with documents
    kg_index = KnowledgeGraphIndex.from_documents(
        documents,
        storage_context=storage_context,
        max_triplets_per_chunk=2,
        space_name="your_space_name",
        edge_types=["relationship"],
        rel_prop_names=["relationship"],
        tags=["entity"],
    )
  2. Customize the Retrieval Query:

    retrieval_query = (
        "RETURN 'Interleaf hired Tomaz' AS text, score, node.id AS id, "
        "{author: 'Tomaz', _node_type:node._node_type, _node_content:node._node_content} AS metadata"
    )
  3. Initialize the KnowledgeGraphRAGRetriever:

    from llama_index.core.retrievers import KnowledgeGraphRAGRetriever
    
    graph_rag_retriever = KnowledgeGraphRAGRetriever(
        storage_context=storage_context,
        include_text=True,  # Ensure text content is included
        verbose=True,
    )
  4. Use the Retriever with a Query Engine:

    from llama_index.core.query_engine import RetrieverQueryEngine
    from IPython.display import display, Markdown
    
    # Create the query engine
    query_engine = RetrieverQueryEngine.from_args(
        graph_rag_retriever,
    )
    
    # Perform the query
    response = query_engine.query(
        "Tell me about Peter Quill?",
    )
    display(Markdown(f"<b>{response}</b>"))

By setting include_text to True and defining a retrieval_query that includes the necessary columns (text, score, id, and metadata), the retriever will include the text content and specific metadata of the relevant documents in the query response [1][2][3][4].

To continue talking to Dosu, mention @dosu.