Open nikhitaKanoj opened 4 weeks ago
Same unresolve issue as https://github.com/run-llama/llama_index/issues/13806?
To visualize your entire document as nodes and edges (relations between nodes) in the form of a knowledge graph using KnowledgeGraphIndex
from llama_index
, pyvis
, and networkx
, you can follow these steps:
KnowledgeGraphIndex
.networkx
graph.networkx
graph using pyvis
.Here is a complete example:
from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from pyvis.network import Network
import networkx as nx
# Load your documents
documents = SimpleDirectoryReader("path_to_your_documents").load_data()
# Create and populate your KnowledgeGraphIndex instance
index = KnowledgeGraphIndex.from_documents(documents, max_triplets_per_chunk=2)
# Get the networkx graph representation of the knowledge graph
nx_graph = index.get_networkx_graph()
# Create a Pyvis network
pyvis_net = Network(notebook=True, cdn_resources="in_line", directed=True)
# Convert the networkx graph to a pyvis network
pyvis_net.from_nx(nx_graph)
# Visualize the graph
pyvis_net.show("knowledge_graph.html")
Extract the knowledge graph using KnowledgeGraphIndex
:
KnowledgeGraphIndex
class has a method get_networkx_graph
that converts the internal knowledge graph to a networkx
graph.Convert the knowledge graph to a networkx
graph:
get_networkx_graph
method returns a networkx
graph object.Visualize the networkx
graph using pyvis
:
pyvis
library can convert a networkx
graph to an interactive HTML visualization.from_nx
method of pyvis
's Network
class is used to convert the networkx
graph to a pyvis
network.show
method of pyvis
's Network
class is used to generate and display the HTML visualization.This code will generate an interactive HTML file named knowledge_graph.html
that visualizes your knowledge graph. The KnowledgeGraphIndex
class in llama_index
is designed to build a knowledge graph by extracting triplets and leveraging the graph during query-time, making it suitable for this task.
To ensure you have all the necessary dependencies, you can install them using the following commands:
!pip install llama_index transformers wikipedia html2text pyvis
If you want to filter your extracted relations using data from Wikipedia, you can use the following approach:
import wikipedia
class WikiFilter:
def __init__(self):
self.cache = {}
def filter(self, candidate_entity):
if candidate_entity in self.cache:
return self.cache[candidate_entity]["title"]
try:
page = wikipedia.page(candidate_entity, auto_suggest=False)
entity_data = {
"title": page.title,
"url": page.url,
"summary": page.summary,
}
self.cache[candidate_entity] = entity_data
self.cache[page.title] = entity_data
return entity_data["title"]
except:
return None
wiki_filter = WikiFilter()
def extract_triplets_wiki(text):
relations = extract_triplets(text)
filtered_relations = []
for relation in relations:
(subj, rel, obj) = relation
filtered_subj = wiki_filter.filter(subj)
filtered_obj = wiki_filter.filter(obj)
if filtered_subj is None and filtered_obj is None:
continue
filtered_relations.append(
(
filtered_subj or subj,
rel,
filtered_obj or obj,
)
)
return filtered_relations
You can then use this function to create a filtered knowledge graph:
index1 = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=3,
kg_triplet_extract_fn=extract_triplets_wiki,
storage_context=storage_context,
include_embeddings=True,
)
g = index1.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.save_graph("wiki_filtered_graph.html")
from IPython.display import HTML
HTML(filename="wiki_filtered_graph.html")
This will generate an interactive HTML file named wiki_filtered_graph.html
that visualizes your filtered knowledge graph [1][2][3][4].
To continue talking to Dosu, mention @dosu.
@csaiedu It works for OpenAI LLms and the HuggingFace API called LLms but when I use local LLM like zephyr , i am not getting it so i am assuming if its something get_networkx_graph is not compatible with local LLMs? Not sure ..
@dosu , you gave the same solution as I have already tried , Kindly help me to resolve the issue with get_networkx_graph() method , It does work for me with openAI LLms and Huggingface API called LLMs , I am using Zephyr locally in Jupyter Notebook , I am not getting it . The entire code is same , just the LLMs are different , not sure what is the exact issue
To resolve the issue with the get_networkx_graph()
method when using the local LLM Zephyr, ensure that the networkx
library is installed. The method requires networkx
to visualize the graph structure. You can install it using the following command:
pip install networkx
If networkx
is already installed and you are still facing issues, please check the following:
Ensure Proper Configuration of KnowledgeGraphIndex
with Zephyr:
Make sure that the KnowledgeGraphIndex
is correctly configured to use the local LLM Zephyr. Here is an example setup:
from llama_index.core.service_context import ServiceContext
from llama_index.core.indices.knowledge_graph import KnowledgeGraphIndex
from llama_index.core.storage.storage_context import StorageContext
from llama_index.core.graph_store.simple_graph_store import SimpleGraphStore
# Assuming `zephyr_llm` is your local LLM instance
service_context = ServiceContext(llm=zephyr_llm)
graph_store = SimpleGraphStore()
storage_context = StorageContext.from_defaults(graph_store=graph_store)
index = KnowledgeGraphIndex(
service_context=service_context,
storage_context=storage_context,
max_triplets_per_chunk=2
)
Check the limit
Parameter:
The get_networkx_graph
method has a default limit of 100 nodes. If your document is large, you might need to adjust this limit:
g = index.get_networkx_graph(limit=1000) # Increase the limit as needed
Debugging and Logging: Enable logging to debug the triplet extraction process and ensure that the triplets are being extracted correctly:
import logging
logging.basicConfig(level=logging.DEBUG)
# Your existing code to create and populate the KnowledgeGraphIndex
Verify Triplet Extraction: Ensure that the triplets are being extracted correctly by manually inspecting the extracted triplets:
for document in documents:
triplets = index._extract_triplets(document.get_content())
print(f"Extracted triplets: {triplets}")
If you continue to face issues, please provide more details about the specific error or issue you are encountering. This will help in diagnosing the problem further [1][2].
To continue talking to Dosu, mention @dosu.
Have you tried the new PropertyGraphIndex. It works for Ollama, had to go down that route, not sure if KnowledgeGraphIndex is getting much traction from Llamaindex?
@csaiedu , yea i tried using it , got import errors ..did you manage to get results using PropertyGraphIndex ?
Yes that works fine with Ollama on the tutorial examples, make sure to install the newest llama index libraries.
It could be that zephyr is not properly extracting any triplets here. This is the initial code the index is running to get the data needed to build the nx graph
subjs = list(index.index_struct.table.keys())
# add edges
rel_map = index._graph_store.get_rel_map(subjs=subjs, depth=1, limit=limit)
You could try running that and see what the resulting rel map looks like
knowledge graph index is basically on life support right now, since the property graph index was introduced (and the property graph index is a way better design imo)
Question Validation
Question
I am trying to visualize the entire document in the form of knowledge graph , for this i am using KnowledgeGraphIndex(0 from llama index : https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/KnowledgeGraphDemo/
I am getting answers for the query asked but when i try to visualize it using : from pyvis.network import Network g = index.get_networkx_graph() and then i use g in networkx to draw the graph .
But Issue is , the get_networkx_graph() is not giving the relevant nodes , the output for g.nodes() is : NodeView(("subject','object','alice','bob','cofffee shop')) , my document is large so it should have lot of nodes, but only these nodes will be displayed .
This is how I am constructing Knwoledge Index : from llama_index.core import StorageContext graph_store = SimpleGraphStore() storage_context = StorageContext.from_defaults(graph_store=graph_store) index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=2, storage_context=storage_context, )
How can I visualize my entire document as nodes and edges (relations between nodes) in the form of knowledge graph ?