Closed misi0202 closed 4 months ago
Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the .graphml
extension. GraphRAG saves the output of each step of the workflow to these files. You can explore graphml
files using a tool such as Gephi, which is quite popular https://gephi.org/
is there any way to visualize or store the graph in neo4j?
is there any way to visualize or store the graph in neo4j?
Yes, I also want to store the data generated by Graph RAG in a dedicated graph database(like neo4j). Is there a suitable solution?
Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the
.graphml
extension. GraphRAG saves the output of each step of the workflow to these files. You can exploregraphml
files using a tool such as Gephi, which is quite popular https://gephi.org/
Thanks for your reply!
gephi is good however it is better to visualize in python . is there any similar python library available ?
is there any way to visualize or store the graph in neo4j?
Yes, I also want to store the data generated by Graph RAG in a dedicated graph database(like neo4j). Is there a suitable solution?
You can import generated graphml file into neo4j.
Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the
.graphml
extension. GraphRAG saves the output of each step of the workflow to these files. You can exploregraphml
files using a tool such as Gephi, which is quite popular https://gephi.org/
Hey are you sure that there exists file with 'graphml' extension, as when I ran the code I didn't get any 'graphml' extension based files. I only got the below files:
Hope this helps. Used inside a jupyter notebook.
pip install pandas networkx pyvis
import pandas as pd
import networkx as nx
from pyvis.network import Network
df = pd.read_parquet('output/xyz/artifacts/create_base_entity_graph.parquet')
graphml_data = df.iloc[0]['clustered_graph']
graph = nx.parse_graphml(graphml_data.encode('utf-8'))
net = Network(notebook=True)
net.from_nx(graph)
net.show('graph.html')
Hi!
We visualize these graphml files in Gephi. Once you have loaded any of these (I suggest the summarized_graph.graphml file) into Gephi you can play with the different community and clustering algorithms to obtain your desired visualization. In our case, we run Leiden and Average Degree. Then, for graph layout we do OpenOrd and ForceAtlas2.
I'm planning on a adding a detailed guide for this on the docs so people can recreate the graphs.
Hope this helps. Used inside a jupyter notebook.
pip install pandas networkx pyvis
import pandas as pd import networkx as nx from pyvis.network import Network df = pd.read_parquet('output/xyz/artifacts/create_base_entity_graph.parquet') graphml_data = df.iloc[0]['clustered_graph'] graph = nx.parse_graphml(graphml_data.encode('utf-8')) net = Network(notebook=True) net.from_nx(graph) net.show('graph.html')
Hi!, it only works on create_base_entity_graph.parquet ,how can visualize the others .parquet files
Another tool to create an interactive visualization in Jupyter Notebooks is yfiles-jupyter-graphs. Just pass a list of nodes and relationships and, optionally, map some data to visualization properties (color, size, etc).
For example, in the graphrag/examples_notebooks, you can visualize the parquet files like so:
"""
Uses yfiles-jupyter-graphs to visualize the dataframes.
The dataframes are converted into supported nodes and relationships lists and then passed to yfiles-jupyter-graphs.
Additionally, some values are mapped to visualization properties.
"""
def show_graph(entity_df, relationship_df):
from yfiles_jupyter_graphs import GraphWidget
# converts the entities dataframe to a list of dicts for yfiles-jupyter-graphs
def convert_entities_to_dicts(df):
nodes_dict = {}
for index, row in df.iterrows():
# Create a dictionary for each row and collect unique nodes
node_id = row["title"]
if not node_id in nodes_dict:
nodes_dict[node_id] = {"id": node_id, "properties": {col: value for col, value in zip(row.index, row.values)}}
return list(nodes_dict.values())
# converts the relationships dataframe to a list of dicts for yfiles-jupyter-graphs
def convert_relationships_to_dicts(df):
relationships = []
for index, row in df.iterrows():
# Create a dictionary for each row
relationships.append({"start": row['source'], "end": row['target'], "properties": {col: value for col, value in zip(row.index, row.values)}})
return relationships
w = GraphWidget()
# use the converted data to visualize the graph
w.nodes = convert_entities_to_dicts(entity_df)
w.edges = convert_relationships_to_dicts(relationship_df)
w.directed = True
# show title on the node
w.node_label_mapping = "title"
# map community to a color
def community_to_color(community):
colors = ["crimson", "darkorange", "indigo", "cornflowerblue", "cyan", "teal", "green"]
return colors[int(community) % len(colors)] if community is not None else "lightgray"
def edge_to_source_community(edge):
source_node = next((entry for entry in w.nodes if entry["properties"]["title"] == edge["start"]), None)
source_node_community = source_node["properties"]["community"]
return source_node_community if source_node_community is not None else None
w.node_color_mapping = lambda node : community_to_color(node["properties"]["community"])
w.edge_color_mapping = lambda edge : community_to_color(edge_to_source_community(edge))
# map size data to a reasonable factor
w.node_scale_factor_mapping = lambda node : 0.5 + node["properties"]["size"] * 1.5 / 20
# use weight for edge thickness
w.edge_thickness_factor_mapping = "weight"
# Use the circular layout for this visualization. For larger graphs, the default organic layout is often preferable.
w.circular_layout()
display(w)
# pass the dataframes from the parquet files
show_graph(entity_df, relationship_df)
Which results in this interactive visualization.
Another tool to create an interactive visualization in Jupyter Notebooks is yfiles-jupyter-graphs. Just pass a list of nodes and relationships and, optionally, map some data to visualization properties (color, size, etc).
For example, in the graphrag/examples_notebooks, you can visualize the parquet files like so:
""" Uses yfiles-jupyter-graphs to visualize the dataframes. The dataframes are converted into supported nodes and relationships lists and then passed to yfiles-jupyter-graphs. Additionally, some values are mapped to visualization properties. """ def show_graph(entity_df, relationship_df): from yfiles_jupyter_graphs import GraphWidget # converts the entities dataframe to a list of dicts for yfiles-jupyter-graphs def convert_entities_to_dicts(df): nodes_dict = {} for index, row in df.iterrows(): # Create a dictionary for each row and collect unique nodes node_id = row["title"] if not node_id in nodes_dict: nodes_dict[node_id] = {"id": node_id, "properties": {col: value for col, value in zip(row.index, row.values)}} return list(nodes_dict.values()) # converts the relationships dataframe to a list of dicts for yfiles-jupyter-graphs def convert_relationships_to_dicts(df): relationships = [] for index, row in df.iterrows(): # Create a dictionary for each row relationships.append({"start": row['source'], "end": row['target'], "properties": {col: value for col, value in zip(row.index, row.values)}}) return relationships w = GraphWidget() # use the converted data to visualize the graph w.nodes = convert_entities_to_dicts(entity_df) w.edges = convert_relationships_to_dicts(relationship_df) w.directed = True # show title on the node w.node_label_mapping = "title" # map community to a color def community_to_color(community): colors = ["crimson", "darkorange", "indigo", "cornflowerblue", "cyan", "teal", "green"] return colors[int(community) % len(colors)] if community is not None else "lightgray" def edge_to_source_community(edge): source_node = next((entry for entry in w.nodes if entry["properties"]["title"] == edge["start"]), None) source_node_community = source_node["properties"]["community"] return source_node_community if source_node_community is not None else None w.node_color_mapping = lambda node : community_to_color(node["properties"]["community"]) w.edge_color_mapping = lambda edge : community_to_color(edge_to_source_community(edge)) # map size data to a reasonable factor w.node_scale_factor_mapping = lambda node : 0.5 + node["properties"]["size"] * 1.5 / 20 # use weight for edge thickness w.edge_thickness_factor_mapping = "weight" # Use the circular layout for this visualization. For larger graphs, the default organic layout is often preferable. w.circular_layout() display(w) # pass the dataframes from the parquet files show_graph(entity_df, relationship_df)
Which results in this interactive visualization.
hi,
is that possible to provide the code for reading dataframe for parquest files. And which files we have to use. Thanks
Sure. In this case, I've used the parquets of the local_search.ipynb:
relationship_df = pd.read_parquet(f"{INPUT_DIR}/{RELATIONSHIP_TABLE}.parquet")
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
# [....]
show_graph(entity_df, relationship_df)
Here is the modified local_search.ipynb
sample notebook: https://colab.research.google.com/drive/1ojNWk6cIqEQxV34XdUjmIKb-2s7RVwRm?usp=sharing with the above mentioned code that generates the output. Note that the notebook requires the local parquet files from microsoft/graphrag/examples_notebooks/inputs/operation dulce.
However, yfiles-jupyter-graphs
runs on any structured node and relationships dicts.
What about neo4j????
Although a bit off-topic, yfiles-jupyter-graphs
can visualize to Neo4j databases (see this sample notebook). That said, there is also yfiles-jupyter-graphs-for-neo4j
which is an open-source wrapper for the extension that provides a neo4j tailored API for the data mappings.
Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the
.graphml
extension. GraphRAG saves the output of each step of the workflow to these files. You can exploregraphml
files using a tool such as Gephi, which is quite popular https://gephi.org/Hey are you sure that there exists file with 'graphml' extension, as when I ran the code I didn't get any 'graphml' extension based files. I only got the below files:
You can output graphml file by setting snapshots.graphml config to "True" in the config file.
Here is a quick & dirty variant. Have fun.
https://gist.github.com/jexp/74bd5a43305550236321eab8f0c723c0
We'll provide a proper means to import the GraphRAG Parquet files into Neo4j end of next week.
/cc @tomasonjo
Tomaz wrote the Implemting GraphRAG with Neo4j, GDS + LangChain this week: https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/
@tomasonjo did an improved version adding claims/covariates and findings. https://github.com/tomasonjo/blogs/blob/master/llm/ms_graphrag_import.ipynb
We need to enable embedding generation in the index workflow, for text-units and community summaries and entities to also add them to the neo4j vector index.
@jexp @tomasonjo when you are done with the notebook, the right place to PR it would be https://github.com/microsoft/graphrag/tree/main/examples_notebooks in a new "community" folder. You could optionally create a 'neo4j' directory under there as well if you had additional notebooks in mind related to neo4j specific connections.
I think it's possible to render the graph with 3d-force-graph
Yep. I had even done a small integration with neo4j https://github.com/jexp/neo4j-3d-force-graph
and neo4j.com/labs/neodash also supports 3d force graph.
@beginor Of course, you can use any technology to draw graphs. There's tons of tools and libraries out there and you can even create your own (if you enjoy programming and don't care about reinventing the wheel). The authors of this repo also used a solution, already. The question is not about whether it is possible, but what is a good, easy, convenient, capable etc. existing solution or what is missing from the existing ones. For 3d-force-graph some answers are clear: It's a JS library, meaning that a lot of work needs to be done, first. The graphml/gephi approach is viable for the a simple visualization like in your bitmap and requires less work.
I would love to learn more about what people are really looking for:
Thanks!
@gs7vik Just find snapshots
in the yml configuration and turn it on.
@misi0202 regarding your initial question about how to visualize the nodes and relationships: See graph-visualization.ipynb from the merged PR #569, it embeds an interactive graph visualization of the parquet files and graphrag's result context.
Hope this helps. Used inside a jupyter notebook.
pip install pandas networkx pyvis
import pandas as pd import networkx as nx from pyvis.network import Network df = pd.read_parquet('output/xyz/artifacts/create_base_entity_graph.parquet') graphml_data = df.iloc[0]['clustered_graph'] graph = nx.parse_graphml(graphml_data.encode('utf-8')) net = Network(notebook=True) net.from_nx(graph) net.show('graph.html')
Hi!, it only works on create_base_entity_graph.parquet ,how can visualize the others .parquet files
@xxWeiDG, I think u just gonna need to change the
df = pd.read_parquet('./create_base_extracted_entities.parquet')
graphml_data = df.iloc[0]['entity_graph']
Amongst the artefacts generated in the output folder, you will find parquet files that includes graphml syntax. In other cases, you might also find files with the
.graphml
extension. GraphRAG saves the output of each step of the workflow to these files. You can exploregraphml
files using a tool such as Gephi, which is quite popular https://gephi.org/Hey are you sure that there exists file with 'graphml' extension, as when I ran the code I didn't get any 'graphml' extension based files. I only got the below files:
@gs7vik You may edit your settings.yaml
and turn on graphml
to be true
:
snapshots:
graphml: true
raw_entities: false
top_level_nodes: false
Check out GraphRAG-Visualizer!
Simply upload your parquet files to visualize your graph data. No need for Neo4j or Jupyter Notebook.
Thank you for your great work! I have run the whole process, but it is hard to see the graph clearly. The graph data are stored in the .parquet files, so if there are some method to visualize data I have extracted. Such as neo4j or other database.