run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.34k stars 4.67k forks source link

[Question]: How to add a new relationship to a persisted knowledge graph? #9336

Closed JinSeoung-Oh closed 6 months ago

JinSeoung-Oh commented 7 months ago

Question Validation

Question

Hi, on this time, I want to add new relationship to a persisted KnowledgeGraph It is easy to find about VectorDB, but about KnowledgeGraph is so hard.

I already know how to add new relationship to knowledgeGraphIndex class, but I cannot find how to add new relationship to a persisted KG

To load persisted KG, I used this code

storage_context = StorageContext.from_defaults( docstore = SimpleDocumentStore.from_persist_dir(persist_dir=os.getcwd() + "/kg/graph/test"), vector_store = SimpleVectorStore.from_persist_dir(persist_dir=os.getcwd() + "/kg/graph/test"), index_store = SimpleIndexStore.from_persist_dir(persist_dir=os.getcwd() + "/kg/graph/test"))

from llama_index import load_index_from_storage, load_indices_from_storage, load_graph_from_storage

index = load_index_from_storage(storage_context) indices = load_indices_from_storage(storage_context) graph = load_graph_from_storage(storage_context)

Can I apply upsert method on index? or graph?

Thanks!

dosubot[bot] commented 7 months ago

🤖

Hello @JinSeoung-Oh! Good to see you again. I appreciate your continued interest in LlamaIndex.

Sure, you can add a new relationship to a persisted KnowledgeGraph using the upsert_triplet method. This method is part of the Neo4jGraphStore class in LlamaIndex. Here's an example of how you can use it:

graph_store = Neo4jGraphStore(username="your_username", password="your_password", url="your_neo4j_url")
graph_store.upsert_triplet("subject_node_id", "relationship_type", "object_node_id")

In this example, replace "subject_node_id", "relationship_type", and "object_node_id" with the IDs of the subject and object nodes and the type of the relationship you want to add, respectively.

The upsert_triplet method constructs a Cypher query to add the new relationship and executes it in a Neo4j session. The MERGE keyword in the query ensures that the nodes and the relationship are created if they do not exist, or matched if they do exist.

Here's the code that defines the upsert_triplet method:

def upsert_triplet(self, subj: str, rel: str, obj: str) -> None:
    """Add triplet."""
    query = """
        MERGE (n1:`%s` {id:$subj})
        MERGE (n2:`%s` {id:$obj})
        MERGE (n1)-[:`%s`]->(n2)
    """

    prepared_statement = query % (
        self.node_label,
        self.node_label,
        rel.replace(" ", "_").upper(),
    )

    with self._driver.session(database=self._database) as session:
        session.run(prepared_statement, {"subj": subj, "obj": obj})

You can find this code in the llama_index/graph_stores/neo4j.py file.

I hope this helps! If you have any more questions, feel free to ask.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 7 months ago

@dosu-bot Ah.. Actually, I used nebulargrapstroe. Anyway I can use upsert_triplet function in 'llama-index/graphstore/nebulagraph.py

I have the other question. Actually, I have trying to build KnowledgeGraphRAGRetriever using persisted graph My code is like this:

storage_context = StorageContext.from_defaults(
        docstore = SimpleDocumentStore.from_persist_dir(persist_dir=os.getcwd() + "/kg/graph/test"),
        vector_store = SimpleVectorStore.from_persist_dir(persist_dir=os.getcwd() + "/kg/graph/test"),
        index_store = SimpleIndexStore.from_persist_dir(persist_dir=os.getcwd() + "/kg/graph/test"))

index = load_index_from_storage(storage_context)
indices = load_indices_from_storage(storage_context)
graph = load_graph_from_storage(storage_context)

llm = OpenAI(temperature=0, model="gpt-3.5-turbo-0301")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size_limit=512)

graph_rag_retriever = KnowledgeGraphRAGRetriever(storage_context=storage_context, service_context=service_context, llm=llm,verbose=True,)
query_enginen= RetrieverQueryEngine.from_args(graph_rag_retriever, service_context=service_context)

This code is right? Thanks!

dosubot[bot] commented 7 months ago

🤖

Based on the provided context, your code for initializing a KnowledgeGraphRAGRetriever in the LlamaIndex repository appears to be correct. The KnowledgeGraphRAGRetriever class is defined with the necessary arguments and methods, and it inherits from the BaseRetriever class, which is a common practice in object-oriented programming to extend the functionality of a base class.

Here is the initialization code for the KnowledgeGraphRAGRetriever:

class KnowledgeGraphRAGRetriever(BaseRetriever):
    """
    Knowledge Graph RAG retriever.

    Retriever that perform SubGraph RAG towards knowledge graph.

    Args:
        service_context (Optional[ServiceContext]): A service context to use.
        storage_context (Optional[StorageContext]): A storage context to use.
        entity_extract_fn (Optional[Callable]): A function to extract entities.
        entity_extract_template Optional[BasePromptTemplate]): A Query Key Entity
            Extraction Prompt (see :ref:`Prompt-Templates`).
        entity_extract_policy (Optional[str]): The entity extraction policy to use.
            default: "union"
            possible values: "union", "intersection"
        synonym_expand_fn (Optional[Callable]): A function to expand synonyms.
        synonym_expand_template (Optional[QueryKeywordExpandPrompt]): A Query Key Entity
            Expansion Prompt (see :ref:`Prompt-Templates`).
        synonym_expand_policy (Optional[str]): The synonym expansion policy to use.
            default: "union"
            possible values: "union", "intersection"
        max_entities (int): The maximum number of entities to extract.
            default: 5
        max_synonyms (int): The maximum number of synonyms to expand per entity.
            default: 5
        retriever_mode (Optional[str]): The retriever mode to use.
            default: "keyword"
            possible values: "keyword", "embedding", "keyword_embedding"
        with_nl2graphquery (bool): Whether to combine NL2GraphQuery in context.
            default: False
        graph_traversal_depth (int): The depth of graph traversal.
            default: 2
        max_knowledge_sequence (int): The maximum number of knowledge sequence to
            include in the response. By default, it's 30.
        verbose (bool): Whether to print out debug info.
    """

This class is designed to perform SubGraph RAG (Retrieval-Augmented Generation) towards a knowledge graph. It takes several parameters including service_context, storage_context, entity_extract_fn, entity_extract_template, entity_extract_policy, synonym_expand_fn, synonym_expand_template, synonym_expand_policy, max_entities, max_synonyms, retriever_mode, with_nl2graphquery, graph_traversal_depth, max_knowledge_sequence, and verbose.

As for the load_index_from_storage, load_indices_from_storage, and load_graph_from_storage functions, they are used to load a single index, multiple indices, and a composable graph from the storage context respectively. The storage context contains the docstore, index store, and vector store. These functions are crucial for loading the necessary data into your KnowledgeGraphRAGRetriever.

You can find more details about these functions in the llama_index/indices/loading.py file in the LlamaIndex repository.

Please let me know if you have any other questions or need further clarification.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 7 months ago

@dosu-bot Actually, I want to load persisted nebular Graph and add new relationship on persisted nebular Graph DB And last, build KnowledgeGraphRAGRetriever with this updated nebular graph db.

In this, use case, I think I have to define storagecontext twice, maybe.. It is right..? Thanks!

dosubot[bot] commented 7 months ago

🤖

Based on the context provided, you can load a persisted Nebula Graph, add a new relationship to it, and then build a KnowledgeGraphRAGRetriever using the updated graph in the LlamaIndex data framework by following these steps:

  1. Define the LLM (Language Model) and ServiceContext. This is done once at the beginning of the process.
from llama_index import (
    KnowledgeGraphIndex,
    LLMPredictor,
    ServiceContext,
    SimpleDirectoryReader,
)
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
from llama_index.llms import OpenAI

# define LLM
llm = OpenAI(temperature=0, model="text-davinci-002")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size_limit=512)
  1. Prepare for NebulaGraph. This includes setting up the NebulaGraphStore and creating a StorageContext. This is also done once.
# Define NebulaGraphStore
graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)

# Create StorageContext
storage_context = StorageContext.from_defaults(graph_store=graph_store)
  1. Load the persisted Nebula Graph. This is not explicitly shown in the context, but it would typically involve using the NebulaGraphStore's methods to load the graph from its persisted state.

  2. Add a new relationship to the graph. This is also not explicitly shown in the context, but it would typically involve using the NebulaGraphStore's methods to add a new relationship to the graph.

  3. Build a KnowledgeGraphRAGRetriever using the updated graph.

from llama_index.query_engine import RetrieverQueryEngine
from llama_index.retrievers import KnowledgeGraphRAGRetriever

# Define KnowledgeGraphRAGRetriever
graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    service_context=service_context,
    llm=llm,
    verbose=True,
)

# Define RetrieverQueryEngine
query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever, service_context=service_context
)

You do not need to define the storage context twice in this process. The storage context is defined once after setting up the NebulaGraphStore, and it is then passed to the KnowledgeGraphRAGRetriever.

For more details, you can refer to the knowledge_graph_rag_query_engine.ipynb example in the LlamaIndex repository.

Regarding the specific methods provided by the NebulaGraphStore class in the LlamaIndex framework to load a persisted graph, you can refer to the nebulagraph.py file in the LlamaIndex repository. The methods include init_session_pool, execute, from_dict, get, get_rel_map, upsert_triplet, refresh_schema, get_schema, and query. These methods are used to interact with the NebulaGraph store, execute queries, manage the session pool, and handle the schema of the graph.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 7 months ago

@dosu-bot Thanks

dosubot[bot] commented 7 months ago

🤖

Based on the information provided, the current status of the LlamaIndex repository is active with the latest version being v0.9.12. The latest commit includes various bug fixes and improvements, such as fixing a bug with OpenAIAgent inserting errors into chat history, fixing various bugs with LiteLLM and the new OpenAI client, adding a context window attribute to perplexity llm, adding node_parser attribute back to service context, refactoring MM retriever classes, and fixing TextNode instantiation on SupabaseVectorIndexDemo.

Unfortunately, I couldn't find the information about the latest commit and the author of it within the provided context. If you need more specific details about the latest commit or the author, you might want to check the commit history directly on the LlamaIndex repository.

For more details about the changes made in the latest commit, you can refer to the CHANGELOG.md file in the repository.

If you have any more questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 7 months ago

@dosu-bot By the way, why we define storage_context after define NebularGraphStore class? I mean, I want to load persisted NebularGraph and add new triplet on this And, finally, build KnowledgeGraphRAGRetriver with updated NebularGraph

In this use case, I think we have to define Storage_context after add new triplet on NebularGraph It is wrong?

dosubot[bot] commented 7 months ago

🤖

Yes, your understanding is correct. In the LlamaIndex data framework, it is necessary to define the StorageContext after adding a new triplet to the NebulaGraph for the purpose of building an updated KnowledgeGraphRAGRetriever.

The StorageContext is a utility container for storing nodes, indices, and vectors. It contains a document store, index store, vector stores, and a graph store. When you add a new triplet to the NebulaGraph, you are essentially adding new data to the graph store. To ensure that this new data is included in the KnowledgeGraphRAGRetriever, you need to update the StorageContext.

The StorageContext has a method called persist which is used to persist the storage context. This method saves the current state of the document store, index store, graph store, and vector stores to a specified directory. By calling this method after adding a new triplet to the NebulaGraph, you ensure that the updated graph store is saved and will be included when building the KnowledgeGraphRAGRetriever.

Here is the code snippet for persisting the StorageContext:

storage_context.persist(persist_dir=<your_directory>)

Replace <your_directory> with the directory where you want to save the storage context. You can find more details about this in the storage_context.py file.

As for adding a new triplet to the NebulaGraph, there is a method named upsert_triplet(self, subj: str, rel: str, obj: str) -> None: which seems to be responsible for adding a new triplet. The method takes three arguments: subj (subject), rel (relation), and obj (object). Unfortunately, the implementation of this method is not provided in the context, so the exact process of adding a new triplet cannot be determined. You can find more details about this in the nebulagraph.py file.

I hope this helps! If you have any more questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 7 months ago

@dosu-bot I thought whenever call NebularGrapStore, if I enter same 'NEBULA_USER', 'NEBULA_PASSWORD' and 'NEBULA_ADDRESS, previous NebularGrapStore remove and created new NebularGrapStore. It is right?

JinSeoung-Oh commented 7 months ago

@dosu-bot Oh.. it is my misunderstand. Previous NebularGrapStore was not remove.. It only seemed that way due to several coincidences. I just checked and found out that this was a misunderstanding. sorry

dosubot[bot] commented 7 months ago

🤖

No problem at all! Misunderstandings happen, and I'm here to help. If you have any questions or need assistance with anything else, feel free to ask.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 6 months ago

@dosu-bot Hi, I have another question. When I use upsert_triplet function in NeburalGraph class, it seems like the new triplet just added. It is right? In the codebase, it looks right. I want to ensure that the newly added triplet is linked to existing stored knowledge, automatically way It is possible? And what is different upsert_triplet function in NebularGraph class and KnowledgeGraphIndex class? It looks same

dosubot[bot] commented 6 months ago

🤖

Based on the information from the LlamaIndex repository, the upsert_triplet function in the NebulaGraphStore class is designed to add a new triplet (a subject, a relationship, and an object) to the Nebula Graph. If the subject and object nodes already exist in the graph, the function will create a new relationship between them. If the nodes do not exist, the function will create them and then create the relationship. This is done using the MERGE keyword in the Cypher query, which ensures that the nodes and the relationship are created if they do not exist, or matched if they do exist. This should automatically link the newly added triplet to any existing nodes and relationships in the graph.

As for the upsert_triplet function in the KnowledgeGraphIndex class, it is used to add a new triplet to the knowledge graph index. The function takes a subject, a relationship, and an object as arguments, and adds them to the index. If the subject and object nodes already exist in the index, the function will create a new relationship between them. If the nodes do not exist, the function will create them and then create the relationship.

The main difference between the two functions is where they add the new triplet. The upsert_triplet function in the NebulaGraphStore class adds the triplet to the Nebula Graph, while the upsert_triplet function in the KnowledgeGraphIndex class adds the triplet to the knowledge graph index.

You can find more details in the following sources:

I hope this clarifies your question. If you have any more questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 6 months ago

@dosu-bot Hum.. Okay.. then I have to modify my code.. Anyway if I want to upsert new triplet, then I have to upsert it NebularGraph class and KnowledgeIndex class, it is right?

dosubot[bot] commented 6 months ago

🤖

Yes, you are correct. To add a new triplet to both the NebulaGraph and the KnowledgeGraphIndex, you would need to use the upsert_triplet function in both the NebulaGraphStore class and the KnowledgeGraphIndex class.

The upsert_triplet function in the KnowledgeGraphIndex class is used to insert triplets into the graph store, which is part of the KnowledgeGraphIndex. Here is the relevant code from the KnowledgeGraphIndex class:

def upsert_triplet(self, triplet: Tuple[str, str, str]) -> None:
    """Insert triplets.

    Used for manual insertion of KG triplets (in the form
    of (subject, relationship, object)).

    Args:
        triplet (str): Knowledge triplet

    """
    self._graph_store.upsert_triplet(*triplet)

(Source: llama_index/indices/knowledge_graph/base.py)

As for the NebulaGraphStore class, while the context provided does not include any information about an upsert_triplet function within it, the class does exist in the LlamaIndex codebase. It's reasonable to assume that it would have a similar function for inserting triplets into the NebulaGraph. However, without more information or context about the NebulaGraphStore class, I can't confirm this.

If you have the NebulaGraphStore class in your codebase, I would recommend checking its documentation or source code to see if it has an upsert_triplet function or a similar method for inserting triplets.

I hope this helps! If you have any more questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 6 months ago

@dosu-bot What I meant is that when adding a new triplet, do I have to add the new triplet to both the Nebula Graph Store and the Knowledge Index?

dosubot[bot] commented 6 months ago

🤖

Yes, you are correct. To add a new triplet to both the Nebula Graph and the Knowledge Index, you need to use the upsert_triplet function in both the NebulaGraphStore and KnowledgeGraphIndex classes.

The upsert_triplet function in the KnowledgeGraphIndex class is used to insert triplets into the graph store. This is evident from the following code snippet:

def upsert_triplet(self, triplet: Tuple[str, str, str]) -> None:
    """Insert triplets.

    Used for manual insertion of KG triplets (in the form
    of (subject, relationship, object)).

    Args:
        triplet (str): Knowledge triplet

    """
    self._graph_store.upsert_triplet(*triplet)

Here, self._graph_store.upsert_triplet(*triplet) is called, which implies that the upsert_triplet function of the GraphStore class (which could be NebulaGraphStore or any other class that extends GraphStore) is used to insert the triplet into the graph store.

The upsert_triplet function in the NebulaGraphStore class is designed to insert or update a triplet in the Nebula Graph database. A triplet consists of a subject, a relation, and an object, which together represent a graph edge with a property. The function performs the following steps:

  1. It escapes the input strings for the subject (subj), relation (rel), and object (obj) to ensure they are safe to use in a query.
  2. It checks the vertex ID (_vid_type) type and formats the subject and object accordingly. If the _vid_type is "INT64", it expects the subject and object to be digit strings.
  3. It constructs a Data Manipulation Language (DML) query to insert vertices for the subject and object with their respective names and an edge between them with the relation as a property. The edge also includes a hash of the relation to ensure uniqueness.
  4. It executes the DML query using the execute method of the NebulaGraphStore class.
  5. It asserts that the result of the execution is successful.

Here is the refined function definition based on the context shared:

def upsert_triplet(self, subj: str, rel: str, obj: str) -> None:
    """Insert or update a triplet in the Nebula Graph database."""
    # Escape the input strings
    subj = escape_str(subj)
    rel = escape_str(rel)
    obj = escape_str(obj)

    # Format subject and object fields based on the vertex ID type
    if self._vid_type == "INT64":
        assert all([subj.isdigit(), obj.isdigit()]), "Subject and object should be digit strings in current graph store."
        subj_field = subj
        obj_field = obj
    else:
        subj_field = f"{QUOTE}{subj}{QUOTE}"
        obj_field = f"{QUOTE}{obj}{QUOTE}"

    # Construct the DML query
    edge_field = f"{subj_field}->{obj_field}"
    edge_type = self._edge_types[0]
    rel_prop_name = self._rel_prop_names[0]
    entity_type = self._tags[0]
    rel_hash = hash_string_to_rank(rel)
    dml_query = (
        f"INSERT VERTEX `{entity_type}`(name) VALUES {subj_field}:({QUOTE}{subj}{QUOTE});"
        f"INSERT VERTEX `{entity_type}`(name) VALUES {obj_field}:({QUOTE}{obj}{QUOTE});"
        f"INSERT EDGE `{edge_type}`(`{rel_prop_name}`) VALUES {edge_field}@{rel_hash}:({QUOTE}{rel}{QUOTE});"
    )

    # Execute the query and assert success
    logger.debug(f"upsert_triplet()\nDML query: {dml_query}")
    result = self.execute(dml_query)
    assert result and result.is_succeeded(), f"Failed to upsert triplet: {subj} {rel} {obj}, query: {dml_query}"

You can find more details in the llama_index/indices/knowledge_graph/base.py and llama_index/graph_stores/nebulagraph.py files.

I hope this clarifies your question. If you have any more questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

JinSeoung-Oh commented 6 months ago

@dosu-bot Hum.. Okay.. based on your answer I have to modify my code. Thanks

dosubot[bot] commented 6 months ago

🤖

You're welcome! If you have any more questions or need further assistance while modifying your code, feel free to ask. Good luck with your modifications!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.