[Question]: If manually adding nodes and triplets, what is the role of the chunk of text encapsulated in node?

EXJUSTICE commented 5 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

In the tutorial on the construction and use of Knowledge Graphs in LLamaIndex, there is a section on building the graph and manually adding triplets, which was very useful, but a few questions arose from it.

If we use a NodeParser from a document to generate nodes, and add our own triplets that in no-way represent the chunk (i.e. the chunk talks about programming, we add a triplet("people", "love", "ice-cream"), how would be reflected in graph indexing and querying? Querying would only be based on the triplets, and the underlying text is ignored, correcT?
The networkx visualizations do not represent the nodes, but triplets, correct? The author adds two separate parsed nodes (indexed as 0 and 1). In the visualization, these would show up as two triplets. If one instead kept adding triplets to the same node (the docs state if Node already exists, only triplet will be added), we would still see more triplets, possibly connected or not, but all would be associated to the same underlying TextNode?

dosubot[bot] commented 5 months ago

Hey there, @EXJUSTICE! I'm here to help you with any questions, bugs, or contributions you might have. Let's tackle this together!

Manually adding triplets that do not represent the chunk content can affect graph indexing and querying in LlamaIndex by introducing incorrect or irrelevant relationships between nodes. This can lead to inaccurate query results and a distorted representation of the data.

Key Points:

Graph Indexing and Querying:
- When you add triplets that do not represent the chunk content, the graph indexing will include these triplets, potentially leading to misleading query results. The queries will be based on the triplets, not the underlying text, so if the triplets are irrelevant, the query results will also be irrelevant ^[1].
Networkx Visualizations:
- The visualizations in networkx represent the triplets, not the nodes. If you add triplets to the same node, the visualization will show more triplets, which may or may not be connected, but all will be associated with the same underlying TextNode ^[1].

In summary, while the underlying text in the nodes is used to generate triplets, the queries and visualizations are based on the triplets themselves. Therefore, adding irrelevant triplets can distort the graph's representation and lead to inaccurate query results.

To continue talking to Dosu, mention @dosu.

logan-markewich commented 5 months ago

There is a mode in the KnowledgeGraphIndex retriever (include_text=True) where the text source for matching triplets is used during a query instead of the triplets themselves

This is usually desirable, because triplets alone typically lack a lot of context

run-llama / llama_index