run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.3k stars 4.66k forks source link

get_networkx_graph() from KnowledgeGraphIndex is not working #13949

Open nikhitaKanoj opened 4 weeks ago

nikhitaKanoj commented 4 weeks ago

Question Validation

Question

I am trying to visualize the entire document in the form of knowledge graph , for this i am using KnowledgeGraphIndex(0 from llama index : https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/KnowledgeGraphDemo/

I am getting answers for the query asked but when i try to visualize it using : from pyvis.network import Network g = index.get_networkx_graph() and then i use g in networkx to draw the graph .

But Issue is , the get_networkx_graph() is not giving the relevant nodes , the output for g.nodes() is : NodeView(("subject','object','alice','bob','cofffee shop')) , my document is large so it should have lot of nodes, but only these nodes will be displayed .

This is how I am constructing Knwoledge Index : from llama_index.core import StorageContext graph_store = SimpleGraphStore() storage_context = StorageContext.from_defaults(graph_store=graph_store) index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=2, storage_context=storage_context, )

How can I visualize my entire document as nodes and edges (relations between nodes) in the form of knowledge graph ?

csaiedu commented 4 weeks ago

Same unresolve issue as https://github.com/run-llama/llama_index/issues/13806?

dosubot[bot] commented 4 weeks ago

To visualize your entire document as nodes and edges (relations between nodes) in the form of a knowledge graph using KnowledgeGraphIndex from llama_index, pyvis, and networkx, you can follow these steps:

  1. Extract the knowledge graph using KnowledgeGraphIndex.
  2. Convert the knowledge graph to a networkx graph.
  3. Visualize the networkx graph using pyvis.

Here is a complete example:

from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from pyvis.network import Network
import networkx as nx

# Load your documents
documents = SimpleDirectoryReader("path_to_your_documents").load_data()

# Create and populate your KnowledgeGraphIndex instance
index = KnowledgeGraphIndex.from_documents(documents, max_triplets_per_chunk=2)

# Get the networkx graph representation of the knowledge graph
nx_graph = index.get_networkx_graph()

# Create a Pyvis network
pyvis_net = Network(notebook=True, cdn_resources="in_line", directed=True)

# Convert the networkx graph to a pyvis network
pyvis_net.from_nx(nx_graph)

# Visualize the graph
pyvis_net.show("knowledge_graph.html")

Explanation:

  1. Extract the knowledge graph using KnowledgeGraphIndex:

    • The KnowledgeGraphIndex class has a method get_networkx_graph that converts the internal knowledge graph to a networkx graph.
  2. Convert the knowledge graph to a networkx graph:

    • The get_networkx_graph method returns a networkx graph object.
  3. Visualize the networkx graph using pyvis:

    • The pyvis library can convert a networkx graph to an interactive HTML visualization.
    • The from_nx method of pyvis's Network class is used to convert the networkx graph to a pyvis network.
    • The show method of pyvis's Network class is used to generate and display the HTML visualization.

This code will generate an interactive HTML file named knowledge_graph.html that visualizes your knowledge graph. The KnowledgeGraphIndex class in llama_index is designed to build a knowledge graph by extracting triplets and leveraging the graph during query-time, making it suitable for this task.

Additional Setup:

To ensure you have all the necessary dependencies, you can install them using the following commands:

!pip install llama_index transformers wikipedia html2text pyvis

Example with WikiData Filtering:

If you want to filter your extracted relations using data from Wikipedia, you can use the following approach:

import wikipedia

class WikiFilter:
    def __init__(self):
        self.cache = {}

    def filter(self, candidate_entity):
        if candidate_entity in self.cache:
            return self.cache[candidate_entity]["title"]
        try:
            page = wikipedia.page(candidate_entity, auto_suggest=False)
            entity_data = {
                "title": page.title,
                "url": page.url,
                "summary": page.summary,
            }
            self.cache[candidate_entity] = entity_data
            self.cache[page.title] = entity_data
            return entity_data["title"]
        except:
            return None

wiki_filter = WikiFilter()

def extract_triplets_wiki(text):
    relations = extract_triplets(text)
    filtered_relations = []
    for relation in relations:
        (subj, rel, obj) = relation
        filtered_subj = wiki_filter.filter(subj)
        filtered_obj = wiki_filter.filter(obj)
        if filtered_subj is None and filtered_obj is None:
            continue
        filtered_relations.append(
            (
                filtered_subj or subj,
                rel,
                filtered_obj or obj,
            )
        )
    return filtered_relations

You can then use this function to create a filtered knowledge graph:

index1 = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=3,
    kg_triplet_extract_fn=extract_triplets_wiki,
    storage_context=storage_context,
    include_embeddings=True,
)

g = index1.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.save_graph("wiki_filtered_graph.html")

from IPython.display import HTML
HTML(filename="wiki_filtered_graph.html")

This will generate an interactive HTML file named wiki_filtered_graph.html that visualizes your filtered knowledge graph [1][2][3][4].

To continue talking to Dosu, mention @dosu.

nikhitaKanoj commented 4 weeks ago

@csaiedu It works for OpenAI LLms and the HuggingFace API called LLms but when I use local LLM like zephyr , i am not getting it so i am assuming if its something get_networkx_graph is not compatible with local LLMs? Not sure ..

nikhitaKanoj commented 4 weeks ago

@dosu , you gave the same solution as I have already tried , Kindly help me to resolve the issue with get_networkx_graph() method , It does work for me with openAI LLms and Huggingface API called LLMs , I am using Zephyr locally in Jupyter Notebook , I am not getting it . The entire code is same , just the LLMs are different , not sure what is the exact issue

dosubot[bot] commented 4 weeks ago

To resolve the issue with the get_networkx_graph() method when using the local LLM Zephyr, ensure that the networkx library is installed. The method requires networkx to visualize the graph structure. You can install it using the following command:

pip install networkx

If networkx is already installed and you are still facing issues, please check the following:

  1. Ensure Proper Configuration of KnowledgeGraphIndex with Zephyr: Make sure that the KnowledgeGraphIndex is correctly configured to use the local LLM Zephyr. Here is an example setup:

    from llama_index.core.service_context import ServiceContext
    from llama_index.core.indices.knowledge_graph import KnowledgeGraphIndex
    from llama_index.core.storage.storage_context import StorageContext
    from llama_index.core.graph_store.simple_graph_store import SimpleGraphStore
    
    # Assuming `zephyr_llm` is your local LLM instance
    service_context = ServiceContext(llm=zephyr_llm)
    graph_store = SimpleGraphStore()
    storage_context = StorageContext.from_defaults(graph_store=graph_store)
    
    index = KnowledgeGraphIndex(
       service_context=service_context,
       storage_context=storage_context,
       max_triplets_per_chunk=2
    )
  2. Check the limit Parameter: The get_networkx_graph method has a default limit of 100 nodes. If your document is large, you might need to adjust this limit:

    g = index.get_networkx_graph(limit=1000)  # Increase the limit as needed
  3. Debugging and Logging: Enable logging to debug the triplet extraction process and ensure that the triplets are being extracted correctly:

    import logging
    logging.basicConfig(level=logging.DEBUG)
    
    # Your existing code to create and populate the KnowledgeGraphIndex
  4. Verify Triplet Extraction: Ensure that the triplets are being extracted correctly by manually inspecting the extracted triplets:

    for document in documents:
       triplets = index._extract_triplets(document.get_content())
       print(f"Extracted triplets: {triplets}")

If you continue to face issues, please provide more details about the specific error or issue you are encountering. This will help in diagnosing the problem further [1][2].

To continue talking to Dosu, mention @dosu.

csaiedu commented 4 weeks ago

Have you tried the new PropertyGraphIndex. It works for Ollama, had to go down that route, not sure if KnowledgeGraphIndex is getting much traction from Llamaindex?

nikhitaKanoj commented 4 weeks ago

@csaiedu , yea i tried using it , got import errors ..did you manage to get results using PropertyGraphIndex ?

csaiedu commented 4 weeks ago

Yes that works fine with Ollama on the tutorial examples, make sure to install the newest llama index libraries.

logan-markewich commented 4 weeks ago

It could be that zephyr is not properly extracting any triplets here. This is the initial code the index is running to get the data needed to build the nx graph

subjs = list(index.index_struct.table.keys())

# add edges
rel_map = index._graph_store.get_rel_map(subjs=subjs, depth=1, limit=limit)

You could try running that and see what the resulting rel map looks like

knowledge graph index is basically on life support right now, since the property graph index was introduced (and the property graph index is a way better design imo)