Is there an existing issue for the same feature request?
[X] I have checked the existing issues.
Is your feature request related to a problem?
GraphRAG is great, but a recent paper perhaps offers additional optimisation, most specifically around LLM understanding of graph structure, with potential implications for multi-step reasoning.
"GraphInsight is grounded in two key strategies: 1) placing critical graphical information in positions where LLMs exhibit stronger memory performance, and 2) investigating a lightweight external knowledge base for regions with weaker memory performance, inspired by retrieval-augmented generation (RAG)."
Describe the feature you'd like
The recent paper on GraphInsight introduces several novel techniques that could enhance GraphRAG in RAGFlow's ability to handle complex data across large corpora.
From their paper:
Next, we introduce the construction of our framework’s RAG knowledge base, called the
“GraphRAG base”, and the corresponding RAG process, termed the “GraphRAG process”. Note
that existing RAG techniques Ghosh et al. (2024); Rorseth et al. (2024); Sojitra et al. (2024); Das
et al. (2024) and optimizations are orthogonal to our framework and can further enhance RAG quality, but here we
focus only on the most basic RAG methods.
GraphRAG Base. Conventional RAG base typically require substantial storage and rely on extensive structured or
unstructured data (e.g., documents, knowledge graphs). In contrast, our GraphRAG base, denoted as K, demands
minimal storage overhead. Specifically, for the graph description sequence Tˆ generated by the importance-based
description reorganization, the nodes and edges of the subgraph structures corresponding to the weak memory
regions within Tˆ will be stored as the GraphRAG base. The proportion of stored subgraph structures, denoted as
is adjustable. Since the subgraph structures corresponding to the weak memory regions have already
been ranked by importance, we can conveniently select and store only the top γ% of these structures.
---- other text ---
Finally, the two parts of the augmented information are organized into a prompt, which is then input
into LLMs to assist in the reasoning process for the task
Our GraphInsight framework incorporates two key techniques that can be seamlessly integrated into
the agent-based processes of LLMs to enhance the performance of such multi-step reasoning tasks:
• Initially, during the LLMs agent process’s inception phase, our framework’s importancebased description reorganization method can be applied to the sequence of graph descriptions input into the LLMs. This enhances the LLMs’ overall comprehension of the graph
structure.
Subsequently, in the multi-step reasoning phase of the LLMs agent process, our framework’s GraphRAG method can provide LLMs with enriched information relevant to each step of the reasoning process, thereby improving the quality of the reasoning.
Adaptive Graph Summation:
Integrate GraphInsight's adaptive graph summation technique, which balances information density and LLM token limits. This feature could help in managing large graphs more effectively within RAGFlow.
Multi-step Reasoning: Incorporate GraphInsight's approach to multi-step reasoning.
Importance-based Description Reorganization: Add support for reorganizing graph descriptions based on importance, as described in GraphInsight.
Potential benefits:
Improved handling of large and complex graph structures
Is there an existing issue for the same feature request?
Is your feature request related to a problem?
Describe the feature you'd like
The recent paper on GraphInsight introduces several novel techniques that could enhance GraphRAG in RAGFlow's ability to handle complex data across large corpora.
From their paper:
Adaptive Graph Summation:
Multi-step Reasoning: Incorporate GraphInsight's approach to multi-step reasoning.
Importance-based Description Reorganization: Add support for reorganizing graph descriptions based on importance, as described in GraphInsight.
Potential benefits:
Describe implementation you've considered
No response
Documentation, adoption, use case
No response
Additional information
No response