Closed csaiedu closed 3 months ago
Hey there, @csaiedu! I'm here to help you with your bug report. I'm a bot that can assist with bug fixes, answer questions, and guide you on becoming a contributor. Let's take a look at the issue you've encountered.
The issue you're encountering with the KnowledgeGraphIndex
not returning triplets in a networkx graph is likely due to the extract_triplets
function not integrating properly with the KnowledgeGraphIndex
class. The KnowledgeGraphIndex
class expects triplets in a specific format and stores them in its own structure, but it does not inherently convert these triplets into a networkx graph.
To resolve this, you need to ensure that the extract_triplets
function is correctly extracting triplets and that these triplets are being properly inserted into the KnowledgeGraphIndex
. Additionally, you need to use the get_networkx_graph
method provided by the KnowledgeGraphIndex
class to generate the networkx graph.
Here is an example of how you can integrate the extract_triplets
function and generate a networkx graph:
Define the extract_triplets
function:
def extract_triplets(input_text):
text = triplet_extractor.tokenizer.batch_decode(
[
triplet_extractor(
input_text, return_tensors=True, return_text=False
)[0]["generated_token_ids"]
]
)[0]
triplets = []
relation, subject, relation, object_ = '', '', '', ''
text = text.strip()
current = 'x'
for token in text.replace("<s>", "").replace("<pad>", "").replace("</s>", "").split():
if token == "<triplet>":
current = 't'
if relation != '':
triplets.append((subject.strip(), relation.strip(), object_.strip()))
relation = ''
subject = ''
elif token == "<subj>":
current = 's'
if relation != '':
triplets.append((subject.strip(), relation.strip(), object_.strip()))
object_ = ''
elif token == "<obj>":
current = 'o'
relation = ''
else:
if current == 't':
subject += ' ' + token
elif current == 's':
object_ += ' ' + token
elif current == 'o':
relation += ' ' + token
if subject != '' and relation != '' and object_ != '':
triplets.append((subject.strip(), relation.strip(), object_.strip()))
return triplets
Initialize the KnowledgeGraphIndex
with the custom extract_triplets
function:
from llama_index.core.indices.knowledge_graph.base import KnowledgeGraphIndex
# Initialize the KnowledgeGraphIndex with the custom triplet extraction function
kg_index = KnowledgeGraphIndex(
kg_triplet_extract_fn=extract_triplets,
# other parameters as needed
)
# Add documents to the index (assuming `documents` is a list of Document objects)
kg_index = KnowledgeGraphIndex.from_documents(documents)
# Ensure networkx is installed
try:
import networkx as nx
except ImportError:
raise ImportError("Please install networkx to visualize the graph: `pip install networkx`")
# Generate the networkx graph
graph = kg_index.get_networkx_graph(limit=100)
Visualize the networkx graph:
import matplotlib.pyplot as plt
nx.draw(graph, with_labels=True)
plt.show()
By following these steps, you should be able to extract triplets using your custom extract_triplets
function, insert them into the KnowledgeGraphIndex
, and generate a networkx graph for visualization [1][2].
To continue the conversation, mention @dosu.
This return an error. Within the initialisation of KnowledgeGraphIndex, which according to the suggestions are separated from the from_documents function call: ValueError: One of nodes, objects, or index_struct must be provided.
To resolve the error "One of nodes, objects, or index_struct must be provided" and properly initialize the KnowledgeGraphIndex
, you need to ensure that at least one of the parameters nodes
, objects
, or index_struct
is provided when initializing the KnowledgeGraphIndex
.
Here is an example of how you can initialize the KnowledgeGraphIndex
with a list of nodes
and then generate a networkx graph:
from llama_index.core.schema import BaseNode
from llama_index.core.service_context import ServiceContext
from llama_index.core.storage.storage_context import StorageContext
from llama_index.core.indices.knowledge_graph.base import KnowledgeGraphIndex
# Example nodes
nodes = [
BaseNode(content="Node 1 content"),
BaseNode(content="Node 2 content"),
# Add more nodes as needed
]
# Initialize the KnowledgeGraphIndex
kg_index = KnowledgeGraphIndex(
nodes=nodes,
service_context=ServiceContext(), # Provide a valid ServiceContext if needed
storage_context=StorageContext() # Provide a valid StorageContext if needed
)
# Generate the networkx graph
graph = kg_index.get_networkx_graph(limit=100)
# Now you can work with the generated graph
print(graph.nodes)
print(graph.edges)
In this example, the nodes
parameter is provided to the KnowledgeGraphIndex
constructor, which satisfies the requirement that one of nodes
, objects
, or index_struct
must be provided. After initializing the index, the get_networkx_graph
method is called to generate the networkx graph [1][2].
To continue the conversation, mention @dosu.
This is not helping
@csaiedu your extract_triplets function is incorrect
It should be returning a list of tuples (subject
, relation
, object
), not a list of dicts
Bug Description
When running : https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/knowledge_graph2/
The networkx graph return is displayed blank. No triplet seem to be passed to the knowledge graph. The extract_triplet function seem incorrect: Replacing by hugging face recommended function for extraction works with logged debug showing triplers are extracted
However, still nothing is returned when calling
g = index.get_networkx_graph()
Version
0.10.40
Steps to Reproduce
Trying to run the referred code form the manual
Relevant Logs/Tracbacks
No response