run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.54k stars 5.02k forks source link

[Bug]: GraphRAG implementation Cookbook V2 not working #16120

Open Qutubullah1 opened 1 day ago

Qutubullah1 commented 1 day ago

Bug Description

I am trying to run the Cookbook V2 for Graph rag implementation, but While retrieval I am not getting the response. When checked the GraphRAGQueryEngine. Then there the retrieved chunks is not getting matched with the pattern even though using the same notebook code without any change.

nodes_retrieved = self.index.as_retriever(
        similarity_top_k=similarity_top_k
    ).retrieve(query_str)
pattern = r"(\w+(?:\s+\w+)*)\s*\({[^}]*}\)\s*->\s*([^(]+?)\s*\({[^}]*}\)\s*->\s*(\w+(?:\s+\w+)*)"

Version

0.11.10

Steps to Reproduce

Just add the openai key and run all the cells of notebook

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 day ago

Hey @Qutubullah1! 👋 I'm here to help you with any bugs, questions, or contributions. Let's squash those bugs together!

The issue with the GraphRAG implementation in Cookbook V2 not retrieving the expected response and the retrieved chunks not matching the pattern in GraphRAGQueryEngine can be attributed to the following:

  1. Extraction Process in GraphRAGExtractor:

    • The GraphRAGExtractor class is responsible for extracting triples (subject-relation-object) from text and enriching them with descriptions. If the extraction process does not correctly parse the LLM's output or if the prompt used for extraction is not well-suited for the text, the extracted triples and descriptions may not be accurate or relevant.
  2. Community Summarization in GraphRAGStore:

    • The GraphRAGStore class builds communities and generates summaries for each community. If the community detection algorithm or the summarization process does not accurately capture the relationships and entities, the summaries may not be representative of the actual content.
  3. Query Processing in GraphRAGQueryEngine:

    • The GraphRAGQueryEngine class processes queries using the community summaries. If the summaries are not accurate or if the LLM used for generating answers from the summaries does not interpret them correctly, the final response may not match the expected pattern.

Here is a potential code snippet to debug and address these issues:

# Ensure the extraction prompt is well-defined
extract_prompt = """
Extract triples (subject-relation-object) from the following text and provide descriptions for each entity and relationship:
Text: {text}
"""

# Initialize GraphRAGExtractor with a custom prompt
extractor = GraphRAGExtractor(
    llm=llm,
    extract_prompt=extract_prompt,
    parse_fn=default_parse_triplets_fn,
    max_paths_per_chunk=10,
    num_workers=4
)

# Process nodes to extract triples
nodes = extractor(nodes)

# Build communities and generate summaries
graph_store = GraphRAGStore()
graph_store.build_communities()

# Query the GraphRAGQueryEngine
query_engine = GraphRAGQueryEngine(graph_store=graph_store, llm=llm)
response = query_engine.custom_query("Your query here")

print(response)

By ensuring that each component (extraction, community summarization, and query processing) is correctly implemented and that the prompts used are well-defined, you can improve the accuracy and relevance of the retrieved responses and ensure that the retrieved chunks match the expected pattern.

For more detailed guidance on constructing the GraphRAG pipeline using the LlamaIndex PropertyGraph abstractions, you can refer to the GraphRAG Implementation with LlamaIndex notebook [1][2].

To continue talking to Dosu, mention @dosu.