microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
19.38k stars 1.92k forks source link

Multi-category indexing and graph generation for diverse data types #883

Open aimanyounises1 opened 3 months ago

aimanyounises1 commented 3 months ago

Do you need to file an issue?

Is your feature request related to a problem? Please describe.

We are trying to use GraphRAG to index and generate knowledge graphs for a diverse set of data types . Specifically, we need to handle business explanations, UI code, and backend code as separate but interconnected categories. Currently, it's not clear how to efficiently structure and index these distinct data types while maintaining their relationships.

For example, based on the query we can decide which graph is relevant to retrieve the context from, this will enhance the accuracy by routing to the relevant output directory, that will be categorized as below in the yaml file.

Describe the solution you'd like

1.  Accept multiple data source directories, each corresponding to a different category (e.g., business_explanations, ui_code, be_code).
2.  Allow custom configuration for each category, specifying different parsing and indexing strategies based on the data type.
3.  Generate separate but interconnected knowledge graphs for each category.
4.  Provide a unified querying mechanism that can search across all generated graphs while maintaining context awareness of the different categories.
5.  Enable the definition of relationships between entities across different categories (e.g., linking a UI component to a business concept it implements).

Additional context


data_sources:
  - name: business_explanations
    type: text
    path: input/business_explanations
    file_types: [.txt, .md]
  - name: ui_code
    type: code
    path: input/ui_code
    file_types: [.js, .jsx, .ts, .tsx]
  - name: be_code
    type: code
    path: input/be_code
    file_types: [.java, .py, .cs]

graph_structure:
  - name: business_concepts
    source: business_explanations
    node_types:
      - name: Concept
        properties: [name, description]
  - name: ui_components
    source: ui_code
    node_types:
      - name: Component
        properties: [name, file_path]
  - name: backend_services
    source: be_code
    node_types:
      - name: Service
        properties: [name, file_path]

relationships:
  - name: implements
    source: ui_components
    target: business_concepts
  - name: serves
    source: backend_services
    target: business_concepts
lawyinking commented 1 month ago

hi, did you find a solution?