run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.72k stars 5.26k forks source link

[Question]: How does KnowledgeGraphIndex retrieve nodes? Does it rely on LLM? #12065

Closed DataNoob0723 closed 8 months ago

DataNoob0723 commented 8 months ago

Question Validation

Question

How does KnowledgeGraphIndex retrieve nodes? Does it rely on LLM?

dosubot[bot] commented 8 months ago

Hey @DataNoob0723, good to see you back! Hope you're ready for another deep dive.

The KnowledgeGraphIndex retrieves nodes by utilizing a combination of text embeddings and graph-based retrieval mechanisms. It does not directly rely on Large Language Models (LLMs) for the retrieval process. Instead, it uses embeddings and triplets extraction for querying and retrieving relevant nodes from the knowledge graph.

Here's a brief overview of how it works:

  1. Embeddings: The MockEmbedding class is used to simulate the process of generating embeddings for both queries and text. These embeddings are vectors representing the semantic content of the text or query, which can then be used to find similar content within the graph. The embeddings are used in the retrieval process to find nodes that are semantically related to the query.

  2. Triplets Extraction: The _extract_triplets method, which is mocked in the tests, simulates the extraction of subject-predicate-object triplets from text. These triplets represent the structured information within the knowledge graph and are used to query and retrieve relevant nodes based on the relationships between entities.

  3. Retrieval Mechanism: The KGTableRetriever class is used to perform the actual retrieval of nodes from the knowledge graph. It uses the query embeddings, optionally combined with custom embeddings provided in the QueryBundle, to search the graph for nodes that are semantically related to the query. The retrieval process can be customized to include raw text, use specific query keyword extraction templates, and limit the number of similar nodes returned based on top-k similarity.

You can find more details in the test_retrievers.py file in the LlamaIndex repository.

In summary, the KnowledgeGraphIndex retrieves nodes by using embeddings to find semantically related content and by querying the graph based on extracted triplets. This process does not directly involve LLMs but relies on the embeddings and structured information within the knowledge graph.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 8 months ago

Yes, it uses the LLM to extract keywords