run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
37.16k stars 5.34k forks source link

[Question]: Unable to Delete Nodes from Property Graph Index #16404

Open LikhithRishi opened 1 month ago

LikhithRishi commented 1 month ago

Question Validation

Question

We are working with the PropertyGraphIndex and trying to delete nodes based on their properties. Below is a summary of the approach we're following:

1. Creating the Property Graph Index:

pg_index_chunk = PropertyGraphIndex(
    nodes=all_nodes,  
    property_graph_store=SimplePropertyGraphStore(),
    show_progress=True
)

2. Loading the Saved Property Graph:

pg_index_chunk.storage_context.persist(persist_dir="./pg_storage")
pg_index_chunk = PropertyGraphIndex.from_existing(
    SimplePropertyGraphStore.from_persist_dir("./pg_storage"),
    show_progress=True
)

3. Retrieving and Deleting Nodes: We retrieve nodes based on properties and attempt to delete them:

delete_from_pg = []
nodes = SimplePropertyGraphStore.from_persist_dir("./pg_storage").get(properties={"file_name":'1_pgs_doc.pdf'})
print(nodes)
for i in nodes:
    delete_from_pg.append(i.id)
print(delete_from_pg)
SimplePropertyGraphStore().delete(ids=delete_from_pg)
pg_index_chunk.storage_context.persist(persist_dir="./pg_storage")

Issue:

We are able to retrieve the nodes successfully using the get function, but the delete function doesn't seem to remove the nodes from the property graph. After attempting to delete the nodes and persisting the context again, the nodes still appear to exist in the property graph index.

Question: How can we successfully delete nodes from the PropertyGraphIndex? Are we missing any steps in the deletion process or in get process?

dosubot[bot] commented 1 month ago

To successfully delete nodes from the PropertyGraphIndex, ensure that you are using the correct instance of SimplePropertyGraphStore for both retrieving and deleting nodes. In your code, you are creating a new instance of SimplePropertyGraphStore when calling the delete method, which does not affect the persisted graph store. Instead, you should perform the deletion on the same instance that you used to retrieve the nodes. Here's how you can adjust your code:

  1. Retrieve and Delete Nodes Using the Same Instance:
# Load the persisted graph store
graph_store = SimplePropertyGraphStore.from_persist_dir("./pg_storage")

# Retrieve nodes based on properties
nodes = graph_store.get(properties={"file_name": '1_pgs_doc.pdf'})
delete_from_pg = [node.id for node in nodes]

# Delete nodes using the same graph store instance
graph_store.delete(ids=delete_from_pg)

# Persist the changes
pg_index_chunk.storage_context.persist(persist_dir="./pg_storage")

By using the same graph_store instance for both retrieval and deletion, you ensure that the changes are applied to the correct graph store. This should resolve the issue of nodes not being deleted as expected.

To continue talking to Dosu, mention @dosu.

LikhithRishi commented 1 month ago

@dosu after executing that code now we are getting some key error for which logs are:

graph_store.delete(ids=delete_from_pg)
  File "/home/likhithrishi/miniconda3/envs/gen/lib/python3.12/site-packages/llama_index/core/graph_stores/simple_labelled.py", line 149, in delete
    triplets = self.get_triplets(
               ^^^^^^^^^^^^^^^^^^
  File "/home/likhithrishi/miniconda3/envs/gen/lib/python3.12/site-packages/llama_index/core/graph_stores/simple_labelled.py", line 70, in get_triplets
    triplets = self.graph.get_triplets()
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/likhithrishi/miniconda3/envs/gen/lib/python3.12/site-packages/llama_index/core/graph_stores/types.py", line 152, in get_triplets
    self.nodes[subj],
    ~~~~~~~~~~^^^^^^
KeyError: 'aa3b866f-f990-466f-9906-ddd0c1bf6584'
dosubot[bot] commented 1 month ago

The KeyError you're encountering suggests that the node with the ID 'aa3b866f-f990-466f-9906-ddd0c1bf6584' is being referenced in a triplet, but it does not exist in the nodes dictionary of the graph. This can happen if the node was deleted or never existed in the first place, but the triplet still references it.

To resolve this issue, you can take the following steps:

  1. Check Node Existence: Before attempting to delete nodes, ensure that all node IDs in delete_from_pg actually exist in the graph.nodes dictionary. You can do this by verifying the existence of each node ID in the dictionary.

  2. Handle Missing Nodes Gracefully: Modify the delete method to handle cases where a node ID might not exist in the nodes dictionary. This can prevent the KeyError from being raised.

Here's an example of how you might adjust the code to check for node existence:

# Load the persisted graph store
graph_store = SimplePropertyGraphStore.from_persist_dir("./pg_storage")

# Retrieve nodes based on properties
nodes = graph_store.get(properties={"file_name": '1_pgs_doc.pdf'})
delete_from_pg = [node.id for node in nodes]

# Check for node existence before deletion
existing_ids = set(graph_store.graph.nodes.keys())
delete_from_pg = [node_id for node_id in delete_from_pg if node_id in existing_ids]

# Delete nodes using the same graph store instance
graph_store.delete(ids=delete_from_pg)

# Persist the changes
pg_index_chunk.storage_context.persist(persist_dir="./pg_storage")

By ensuring that you only attempt to delete nodes that exist, you can avoid the KeyError. If the issue persists, you may need to investigate further to ensure that the graph's state is consistent and that no operations are inadvertently leaving orphaned references in the triplets.

To continue talking to Dosu, mention @dosu.

LikhithRishi commented 1 month ago

@dosu Still same issue

 triplets = self.get_triplets(
               ^^^^^^^^^^^^^^^^^^
  File "/home/likhithrishi/miniconda3/envs/gen/lib/python3.12/site-packages/llama_index/core/graph_stores/simple_labelled.py", line 70, in get_triplets
    triplets = self.graph.get_triplets()
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/likhithrishi/miniconda3/envs/gen/lib/python3.12/site-packages/llama_index/core/graph_stores/types.py", line 152, in get_triplets
    self.nodes[subj],
    ~~~~~~~~~~^^^^^^
KeyError: 'e7497956-d6be-49f8-aeb4-83978f78efd2'
dosubot[bot] commented 1 month ago

The KeyError you're encountering indicates that the node with the ID 'e7497956-d6be-49f8-aeb4-83978f78efd2' is being referenced in a triplet, but it does not exist in the nodes dictionary. This can occur if the node was deleted or never existed, but the triplet still references it.

To address this issue, you can modify the get_triplets method to handle missing nodes more gracefully. Here's a potential approach:

  1. Filter Out Invalid Triplets: Before accessing self.nodes[subj], check if subj exists in self.nodes. If it doesn't, skip that triplet.

Here's an example of how you might adjust the get_triplets method:

def get_triplets(self) -> List[Triplet]:
    """Get all triplets."""
    valid_triplets = []
    for subj, rel, obj in self.triplets:
        if subj in self.nodes and obj in self.nodes:
            valid_triplets.append((
                self.nodes[subj],
                self.relations[
                    self._get_relation_key(obj_id=obj, subj_id=subj, rel_id=rel)
                ],
                self.nodes[obj],
            ))
    return valid_triplets

This modification ensures that only triplets with valid node references are processed, preventing KeyError from being raised. If the issue persists, you may need to further investigate the consistency of your graph data to ensure that all triplets have valid node references.

To continue talking to Dosu, mention @dosu.