infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
22.86k stars 2.24k forks source link

[Feature Request]: Explore GraphInsight similarly to GraphRAG #2455

Open jjohare opened 2 months ago

jjohare commented 2 months ago

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

GraphRAG is great, but a recent paper perhaps offers additional optimisation, most specifically around LLM understanding of graph structure, with potential implications for multi-step reasoning.

"GraphInsight is grounded in two key strategies: 1) placing critical graphical information in positions where LLMs exhibit stronger memory performance, and 2) investigating a lightweight external knowledge base for regions with weaker memory performance, inspired by retrieval-augmented generation (RAG)."

Describe the feature you'd like

The recent paper on GraphInsight introduces several novel techniques that could enhance GraphRAG in RAGFlow's ability to handle complex data across large corpora.

From their paper:

Next, we introduce the construction of our framework’s RAG knowledge base, called the “GraphRAG base”, and the corresponding RAG process, termed the “GraphRAG process”. Note that existing RAG techniques Ghosh et al. (2024); Rorseth et al. (2024); Sojitra et al. (2024); Das et al. (2024) and optimizations are orthogonal to our framework and can further enhance RAG quality, but here we focus only on the most basic RAG methods. GraphRAG Base. Conventional RAG base typically require substantial storage and rely on extensive structured or unstructured data (e.g., documents, knowledge graphs). In contrast, our GraphRAG base, denoted as K, demands minimal storage overhead. Specifically, for the graph description sequence Tˆ generated by the importance-based description reorganization, the nodes and edges of the subgraph structures corresponding to the weak memory regions within Tˆ will be stored as the GraphRAG base. The proportion of stored subgraph structures, denoted as is adjustable. Since the subgraph structures corresponding to the weak memory regions have already been ranked by importance, we can conveniently select and store only the top γ% of these structures. ---- other text --- Finally, the two parts of the augmented information are organized into a prompt, which is then input into LLMs to assist in the reasoning process for the task Our GraphInsight framework incorporates two key techniques that can be seamlessly integrated into the agent-based processes of LLMs to enhance the performance of such multi-step reasoning tasks: • Initially, during the LLMs agent process’s inception phase, our framework’s importancebased description reorganization method can be applied to the sequence of graph descriptions input into the LLMs. This enhances the LLMs’ overall comprehension of the graph structure. Subsequently, in the multi-step reasoning phase of the LLMs agent process, our framework’s GraphRAG method can provide LLMs with enriched information relevant to each step of the reasoning process, thereby improving the quality of the reasoning.

  1. Adaptive Graph Summation:

    • Integrate GraphInsight's adaptive graph summation technique, which balances information density and LLM token limits. This feature could help in managing large graphs more effectively within RAGFlow.
  2. Multi-step Reasoning: Incorporate GraphInsight's approach to multi-step reasoning.

  3. Importance-based Description Reorganization: Add support for reorganizing graph descriptions based on importance, as described in GraphInsight.

Potential benefits:

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response