run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.67k stars 5.25k forks source link

[Question]: If manually adding nodes and triplets, what is the role of the chunk of text encapsulated in node? #13909

Closed EXJUSTICE closed 2 months ago

EXJUSTICE commented 5 months ago

Question Validation

Question

In the tutorial on the construction and use of Knowledge Graphs in LLamaIndex, there is a section on building the graph and manually adding triplets, which was very useful, but a few questions arose from it.

dosubot[bot] commented 5 months ago

Hey there, @EXJUSTICE! I'm here to help you with any questions, bugs, or contributions you might have. Let's tackle this together!

Manually adding triplets that do not represent the chunk content can affect graph indexing and querying in LlamaIndex by introducing incorrect or irrelevant relationships between nodes. This can lead to inaccurate query results and a distorted representation of the data.

Key Points:

  1. Graph Indexing and Querying:

    • When you add triplets that do not represent the chunk content, the graph indexing will include these triplets, potentially leading to misleading query results. The queries will be based on the triplets, not the underlying text, so if the triplets are irrelevant, the query results will also be irrelevant [1].
  2. Networkx Visualizations:

    • The visualizations in networkx represent the triplets, not the nodes. If you add triplets to the same node, the visualization will show more triplets, which may or may not be connected, but all will be associated with the same underlying TextNode [1].

In summary, while the underlying text in the nodes is used to generate triplets, the queries and visualizations are based on the triplets themselves. Therefore, adding irrelevant triplets can distort the graph's representation and lead to inaccurate query results.

To continue talking to Dosu, mention @dosu.

logan-markewich commented 5 months ago

There is a mode in the KnowledgeGraphIndex retriever (include_text=True) where the text source for matching triplets is used during a query instead of the triplets themselves

This is usually desirable, because triplets alone typically lack a lot of context