microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
19.34k stars 1.91k forks source link

[Bug]: EMPTY community report in local search leads to no-use of community information #1391

Open LevickCG opened 1 week ago

LevickCG commented 1 week ago

Do you need to file an issue?

Describe the bug

I’ve been exploring the codebase for GraphRAG and recently noticed that the community reports used for query augmentation in local search appear empty.

Below is an image of intermediate debug information using pdb, where you can see the selected community is empty: from graphrag/query/structured_search/local_search/mixed_context.py:249

Image

In consequnce, we see the community_context_data is empty.

Image

For the final response we see the lack of reports info. Image

This leads to missing community structure information in local search, which seems to degrade GraphRAG's performance and creates a discrepancy between the code implementation and the paper.

Steps to reproduce

  1. Initialize the environment and create index as official getting-started-guide

https://microsoft.github.io/graphrag/get_started/

You can specify the raw text on your own.

  1. Create debug python file underyour_path_to_graphrag/graphrag/graphrag/graphrag/cli/

cd ./graphrag/cli touch debug_query.py

add the codes below to debug_query.py, it will launch a local search according to your query.

from query import run_local_search
from pathlib import Path

run_local_search(
    config_filepath=None,
    data_dir=Path("your_path_to_graphrag/graphrag/graphrag/ragtest/output"),# modify to your path
    root_dir=Path("your_path_to_graphrag/graphrag/graphrag/ragtest"),# modify to your path
    community_level=2,
    response_type="text",
    streaming=False,
    query="Any query here you want to ask" #place it to your desired query
)
  1. Add import pdb;pdb.set_trace() to graphrag/query/structured_search/local_search/mixed_context.py:254

  2. run the code and print debug info

python3 -m pdb debug_query.py

run the code and it will stop at mixed_context.py:254, print the community information and you'll see it's empty.

p selected_communities

Expected Behavior

1.The selected_communities should not be empty.

2.Accordingly, community context should not be empty.

3.For the local search response, it should show the data source with entity, relationship, report (now report is missing).

GraphRAG Config Used

# Default config in getting-starting-guide
llm = "gpt-4o-mini"
embedding_model = "text-embedding-3-small"

Logs and screenshots

See images provided.

Additional Information

LevickCG commented 1 week ago

After further investigation, I identified that the root cause of this issue is due to the mismatch of uuid to human readable id in the search process.

Image

I plan to work on a fix and submit a pull request. I’d appreciate any feedback or guidance from the maintainers to ensure my approach aligns with the project’s design principles.