microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
19.47k stars 1.92k forks source link

How can I visualize the graph and relationship? #418

Closed misi0202 closed 4 months ago

misi0202 commented 4 months ago

Thank you for your great work! I have run the whole process, but it is hard to see the graph clearly. The graph data are stored in the .parquet files, so if there are some method to visualize data I have extracted. Such as neo4j or other database.

eyast commented 4 months ago

Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the .graphml extension. GraphRAG saves the output of each step of the workflow to these files. You can explore graphml files using a tool such as Gephi, which is quite popular https://gephi.org/

Ashraful512 commented 4 months ago

is there any way to visualize or store the graph in neo4j?

flikeok commented 4 months ago

is there any way to visualize or store the graph in neo4j?

Yes, I also want to store the data generated by Graph RAG in a dedicated graph database(like neo4j). Is there a suitable solution?

misi0202 commented 4 months ago

Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the .graphml extension. GraphRAG saves the output of each step of the workflow to these files. You can explore graphml files using a tool such as Gephi, which is quite popular https://gephi.org/

Thanks for your reply!

hemangjoshi37a commented 4 months ago

gephi is good however it is better to visualize in python . is there any similar python library available ?

noworneverev commented 4 months ago

is there any way to visualize or store the graph in neo4j?

Yes, I also want to store the data generated by Graph RAG in a dedicated graph database(like neo4j). Is there a suitable solution?

You can import generated graphml file into neo4j.

gs7vik commented 4 months ago

Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the .graphml extension. GraphRAG saves the output of each step of the workflow to these files. You can explore graphml files using a tool such as Gephi, which is quite popular https://gephi.org/

Hey are you sure that there exists file with 'graphml' extension, as when I ran the code I didn't get any 'graphml' extension based files. I only got the below files:

artifacts
danielcmm commented 4 months ago

Hope this helps. Used inside a jupyter notebook.

pip install pandas networkx pyvis

import pandas as pd
import networkx as nx
from pyvis.network import Network

df = pd.read_parquet('output/xyz/artifacts/create_base_entity_graph.parquet')
graphml_data = df.iloc[0]['clustered_graph']
graph = nx.parse_graphml(graphml_data.encode('utf-8'))

net = Network(notebook=True)
net.from_nx(graph)
net.show('graph.html')
AlonsoGuevara commented 4 months ago

Hi!

We visualize these graphml files in Gephi. Once you have loaded any of these (I suggest the summarized_graph.graphml file) into Gephi you can play with the different community and clustering algorithms to obtain your desired visualization. In our case, we run Leiden and Average Degree. Then, for graph layout we do OpenOrd and ForceAtlas2.

I'm planning on a adding a detailed guide for this on the docs so people can recreate the graphs.

xxWeiDG commented 4 months ago

Hope this helps. Used inside a jupyter notebook.

pip install pandas networkx pyvis

import pandas as pd
import networkx as nx
from pyvis.network import Network

df = pd.read_parquet('output/xyz/artifacts/create_base_entity_graph.parquet')
graphml_data = df.iloc[0]['clustered_graph']
graph = nx.parse_graphml(graphml_data.encode('utf-8'))

net = Network(notebook=True)
net.from_nx(graph)
net.show('graph.html')

Hi!, it only works on create_base_entity_graph.parquet ,how can visualize the others .parquet files

fskpf commented 4 months ago

Another tool to create an interactive visualization in Jupyter Notebooks is yfiles-jupyter-graphs. Just pass a list of nodes and relationships and, optionally, map some data to visualization properties (color, size, etc).

For example, in the graphrag/examples_notebooks, you can visualize the parquet files like so:

"""
Uses yfiles-jupyter-graphs to visualize the dataframes.

The dataframes are converted into supported nodes and relationships lists and then passed to yfiles-jupyter-graphs.
Additionally, some values are mapped to visualization properties.
"""
def show_graph(entity_df, relationship_df):
    from yfiles_jupyter_graphs import GraphWidget

    # converts the entities dataframe to a list of dicts for yfiles-jupyter-graphs
    def convert_entities_to_dicts(df):
        nodes_dict = {}
        for index, row in df.iterrows():
            # Create a dictionary for each row and collect unique nodes
            node_id = row["title"]
            if not node_id in nodes_dict:
                nodes_dict[node_id] = {"id": node_id, "properties": {col: value for col, value in zip(row.index, row.values)}}
        return list(nodes_dict.values())

    # converts the relationships dataframe to a list of dicts for yfiles-jupyter-graphs
    def convert_relationships_to_dicts(df):
        relationships = []
        for index, row in df.iterrows():
            # Create a dictionary for each row
            relationships.append({"start": row['source'], "end": row['target'], "properties": {col: value for col, value in zip(row.index, row.values)}})
        return relationships

    w = GraphWidget()
    # use the converted data to visualize the graph
    w.nodes = convert_entities_to_dicts(entity_df)
    w.edges = convert_relationships_to_dicts(relationship_df)
    w.directed = True
    # show title on the node
    w.node_label_mapping = "title"
    # map community to a color
    def community_to_color(community):
        colors = ["crimson", "darkorange", "indigo", "cornflowerblue", "cyan", "teal", "green"]
        return colors[int(community) % len(colors)] if community is not None else "lightgray"
    def edge_to_source_community(edge):
        source_node = next((entry for entry in w.nodes if entry["properties"]["title"] == edge["start"]), None)
        source_node_community = source_node["properties"]["community"]
        return source_node_community if source_node_community is not None else None
    w.node_color_mapping = lambda node : community_to_color(node["properties"]["community"])
    w.edge_color_mapping = lambda edge : community_to_color(edge_to_source_community(edge))
    # map size data to a reasonable factor
    w.node_scale_factor_mapping = lambda node : 0.5 + node["properties"]["size"] * 1.5 / 20
    # use weight for edge thickness
    w.edge_thickness_factor_mapping = "weight"
    # Use the circular layout for this visualization. For larger graphs, the default organic layout is often preferable.
    w.circular_layout()
    display(w)

# pass the dataframes from the parquet files
show_graph(entity_df, relationship_df)

Which results in this interactive visualization. image

ZeyuTeng96 commented 4 months ago

Another tool to create an interactive visualization in Jupyter Notebooks is yfiles-jupyter-graphs. Just pass a list of nodes and relationships and, optionally, map some data to visualization properties (color, size, etc).

For example, in the graphrag/examples_notebooks, you can visualize the parquet files like so:

"""
Uses yfiles-jupyter-graphs to visualize the dataframes.

The dataframes are converted into supported nodes and relationships lists and then passed to yfiles-jupyter-graphs.
Additionally, some values are mapped to visualization properties.
"""
def show_graph(entity_df, relationship_df):
    from yfiles_jupyter_graphs import GraphWidget

    # converts the entities dataframe to a list of dicts for yfiles-jupyter-graphs
    def convert_entities_to_dicts(df):
        nodes_dict = {}
        for index, row in df.iterrows():
            # Create a dictionary for each row and collect unique nodes
            node_id = row["title"]
            if not node_id in nodes_dict:
                nodes_dict[node_id] = {"id": node_id, "properties": {col: value for col, value in zip(row.index, row.values)}}
        return list(nodes_dict.values())

    # converts the relationships dataframe to a list of dicts for yfiles-jupyter-graphs
    def convert_relationships_to_dicts(df):
        relationships = []
        for index, row in df.iterrows():
            # Create a dictionary for each row
            relationships.append({"start": row['source'], "end": row['target'], "properties": {col: value for col, value in zip(row.index, row.values)}})
        return relationships

    w = GraphWidget()
    # use the converted data to visualize the graph
    w.nodes = convert_entities_to_dicts(entity_df)
    w.edges = convert_relationships_to_dicts(relationship_df)
    w.directed = True
    # show title on the node
    w.node_label_mapping = "title"
    # map community to a color
    def community_to_color(community):
        colors = ["crimson", "darkorange", "indigo", "cornflowerblue", "cyan", "teal", "green"]
        return colors[int(community) % len(colors)] if community is not None else "lightgray"
    def edge_to_source_community(edge):
        source_node = next((entry for entry in w.nodes if entry["properties"]["title"] == edge["start"]), None)
        source_node_community = source_node["properties"]["community"]
        return source_node_community if source_node_community is not None else None
    w.node_color_mapping = lambda node : community_to_color(node["properties"]["community"])
    w.edge_color_mapping = lambda edge : community_to_color(edge_to_source_community(edge))
    # map size data to a reasonable factor
    w.node_scale_factor_mapping = lambda node : 0.5 + node["properties"]["size"] * 1.5 / 20
    # use weight for edge thickness
    w.edge_thickness_factor_mapping = "weight"
    # Use the circular layout for this visualization. For larger graphs, the default organic layout is often preferable.
    w.circular_layout()
    display(w)

# pass the dataframes from the parquet files
show_graph(entity_df, relationship_df)

Which results in this interactive visualization. image

hi,

is that possible to provide the code for reading dataframe for parquest files. And which files we have to use. Thanks

fskpf commented 4 months ago

Sure. In this case, I've used the parquets of the local_search.ipynb:

relationship_df = pd.read_parquet(f"{INPUT_DIR}/{RELATIONSHIP_TABLE}.parquet")
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
# [....]
show_graph(entity_df, relationship_df)

Here is the modified local_search.ipynb sample notebook: https://colab.research.google.com/drive/1ojNWk6cIqEQxV34XdUjmIKb-2s7RVwRm?usp=sharing with the above mentioned code that generates the output. Note that the notebook requires the local parquet files from microsoft/graphrag/examples_notebooks/inputs/operation dulce.

However, yfiles-jupyter-graphs runs on any structured node and relationships dicts.

tawsifkamal commented 4 months ago

What about neo4j????

fskpf commented 4 months ago

Although a bit off-topic, yfiles-jupyter-graphs can visualize to Neo4j databases (see this sample notebook). That said, there is also yfiles-jupyter-graphs-for-neo4j which is an open-source wrapper for the extension that provides a neo4j tailored API for the data mappings.

Hanzhang-lang commented 4 months ago

Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the .graphml extension. GraphRAG saves the output of each step of the workflow to these files. You can explore graphml files using a tool such as Gephi, which is quite popular https://gephi.org/

Hey are you sure that there exists file with 'graphml' extension, as when I ran the code I didn't get any 'graphml' extension based files. I only got the below files: artifacts

You can output graphml file by setting snapshots.graphml config to "True" in the config file.

jexp commented 4 months ago

Here is a quick & dirty variant. Have fun.

  1. Python notebook with df.load_parquet and the neo4j driver
  2. Cypher script using apoc.load.parquet (needs the hadoop dependencies)

https://gist.github.com/jexp/74bd5a43305550236321eab8f0c723c0

Bildschirmfoto 2024-07-12 um 03 15 09

We'll provide a proper means to import the GraphRAG Parquet files into Neo4j end of next week.

/cc @tomasonjo

Tomaz wrote the Implemting GraphRAG with Neo4j, GDS + LangChain this week: https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/

jexp commented 4 months ago

@tomasonjo did an improved version adding claims/covariates and findings. https://github.com/tomasonjo/blogs/blob/master/llm/ms_graphrag_import.ipynb

We need to enable embedding generation in the index workflow, for text-units and community summaries and entities to also add them to the neo4j vector index.

stevetru1 commented 4 months ago

@jexp @tomasonjo when you are done with the notebook, the right place to PR it would be https://github.com/microsoft/graphrag/tree/main/examples_notebooks in a new "community" folder. You could optionally create a 'neo4j' directory under there as well if you had additional notebooks in mind related to neo4j specific connections.

jexp commented 4 months ago

@stevetru1 done: https://github.com/microsoft/graphrag/pull/544

Preview:

https://github.com/microsoft/graphrag/blob/1f1d7b2c764db883a031b19923cb00a9db4844fe/examples_notebooks/neo4j/graphrag_import_neo4j_cypher.ipynb

beginor commented 4 months ago

I think it's possible to render the graph with 3d-force-graph

3d-force-graph

jexp commented 4 months ago

Yep. I had even done a small integration with neo4j https://github.com/jexp/neo4j-3d-force-graph

and neo4j.com/labs/neodash also supports 3d force graph.

yGuy commented 4 months ago

@beginor Of course, you can use any technology to draw graphs. There's tons of tools and libraries out there and you can even create your own (if you enjoy programming and don't care about reinventing the wheel). The authors of this repo also used a solution, already. The question is not about whether it is possible, but what is a good, easy, convenient, capable etc. existing solution or what is missing from the existing ones. For 3d-force-graph some answers are clear: It's a JS library, meaning that a lot of work needs to be done, first. The graphml/gephi approach is viable for the a simple visualization like in your bitmap and requires less work.

I would love to learn more about what people are really looking for:

Thanks!

RipperTs commented 4 months ago

@gs7vik Just find snapshots in the yml configuration and turn it on.

fskpf commented 4 months ago

@misi0202 regarding your initial question about how to visualize the nodes and relationships: See graph-visualization.ipynb from the merged PR #569, it embeds an interactive graph visualization of the parquet files and graphrag's result context.

Bai1026 commented 4 months ago

Hope this helps. Used inside a jupyter notebook. pip install pandas networkx pyvis

import pandas as pd
import networkx as nx
from pyvis.network import Network

df = pd.read_parquet('output/xyz/artifacts/create_base_entity_graph.parquet')
graphml_data = df.iloc[0]['clustered_graph']
graph = nx.parse_graphml(graphml_data.encode('utf-8'))

net = Network(notebook=True)
net.from_nx(graph)
net.show('graph.html')

Hi!, it only works on create_base_entity_graph.parquet ,how can visualize the others .parquet files

@xxWeiDG, I think u just gonna need to change the

df = pd.read_parquet('./create_base_extracted_entities.parquet')
graphml_data = df.iloc[0]['entity_graph']
xianminx commented 4 months ago

Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the .graphml extension. GraphRAG saves the output of each step of the workflow to these files. You can explore graphml files using a tool such as Gephi, which is quite popular https://gephi.org/

Hey are you sure that there exists file with 'graphml' extension, as when I ran the code I didn't get any 'graphml' extension based files. I only got the below files: artifacts

@gs7vik You may edit your settings.yaml and turn on graphml to be true:

snapshots:
  graphml: true
  raw_entities: false
  top_level_nodes: false
timothymeyers commented 4 months ago

340, related

noworneverev commented 3 months ago

Check out GraphRAG-Visualizer!

Simply upload your parquet files to visualize your graph data. No need for Neo4j or Jupyter Notebook.

image