microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
19.9k stars 1.95k forks source link

Community report grouping. #841

Closed FatemaD1577 closed 3 months ago

FatemaD1577 commented 3 months ago

Is there an existing issue for this?

Describe the issue

I am trying to understand implementation of graph rag. I have gone through the documentation available and also through the git repo to understand the global query part. My understanding of the implementation is as follows:

  1. We first create a graph based on the entities and relationships extracted.
  2. Communities are then created by grouping closely related entities
  3. At the next level community reports are created.
  4. Whenever user query comes we shuffle the community reports randomly and group them
  5. An intermediate response is generated for each group and a score is assigned to these groups based on the relevance of the answer to the user query
  6. Responses with 0 score are filtered out and the remaining are passed on to the LLM for final response generation

Below are some of the doubts still not cleared:

  1. Are community reports created at different levels to incrementally cover larger amount of information?
  2. On what basis are the communities reports grouped? Is there any parameter to control number of community reports that will be included in one group?
  3. Only responses with 0 score are filtered out or is there a threshold below which the responses will not be considered for final response generation?
  4. On what basis are we scoring the intermediate response in relevance to user query? Is it using similarity search or any other method to determine the score?

Having an answer to these doubts will help me have more clarity on the process of the global querying part.

Thank you in advance.

Steps to reproduce

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

Xiyuche commented 3 months ago

For the intermediate response score, this might be helpful, from the paper

image
natoverse commented 3 months ago

Moving to Discussions: https://github.com/microsoft/graphrag/discussions/849