Closed BalasubramanyamEvani closed 3 weeks ago
Hello @BalasubramanyamEvani! 👋 I'm here to help you with any bugs, questions, or contributions. Let's tackle this together!
To handle deleting nodes in a ChromaVectorStore
index and ensure the removal of node IDs from docstore.json
when the storage context is persisted, you should use the delete_nodes
method provided in the ChromaVectorStore
class. This method deletes nodes based on their node_ids
.
Here is an example of how to use the delete_nodes
method:
def test_delete_nodes(
vector_store: ChromaVectorStore, node_embeddings: List[TextNode]
) -> None:
vector_store.add(node_embeddings)
vector_store.delete_nodes(
node_ids=[
"c330d77f-90bd-4c51-9ed2-57d8d693b3b0",
"c3d1e1dd-8fb4-4b8f-b7ea-7fa96038d39d",
]
)
res = vector_store.get_nodes(
node_ids=[
"c330d77f-90bd-4c51-9ed2-57d8d693b3b0",
"c3d1e1dd-8fb4-4b8f-b7ea-7fa96038d39d",
"c3ew11cd-8fb4-4b8f-b7ea-7fa96038d39d",
]
)
assert len(res) == 1
assert res[0].get_content() == "lorem ipsum"
assert res[0].id_ == "c3ew11cd-8fb4-4b8f-b7ea-7fa96038d39d"
This method will remove the nodes associated with the given node_ids
from the ChromaDB collection, which should also handle the removal of node IDs from docstore.json
when the storage context is persisted [1][2][3].
Additionally, the VectorStoreIndex
class in the LlamaIndex framework provides a similar method for deleting nodes. The delete_nodes
method in this class ensures that nodes are removed from both the vector store and the document store if necessary:
def delete_nodes(
self,
node_ids: List[str],
delete_from_docstore: bool = False,
**delete_kwargs: Any,
) -> None:
"""Delete a list of nodes from the index.
Args:
node_ids (List[str]): A list of node_ids from the nodes to delete
"""
# delete nodes from vector store
self._vector_store.delete_nodes(node_ids, **delete_kwargs)
# delete from docstore only if needed
if (
not self._vector_store.stores_text or self._store_nodes_override
) and delete_from_docstore:
for node_id in node_ids:
self._docstore.delete_document(node_id, raise_error=False)
This ensures that the nodes are properly deleted from both the vector store and the document store, maintaining consistency in the storage context [4][5].
To continue talking to Dosu, mention @dosu.
@dosu but index._store_nodes_override
has to be set in-order to remove entries from docstore.
Question Validation
Question
Hello,
My use case is as follows: When I create a vector store index (ChromaVectorStore) and persist the storage context, how should one go about deleting the nodes later? Let's say I want to delete all the nodes.
I can probably get all the IDs in the ChromaDB collection and then delete them, but as far as I understand, the deleted node IDs should be removed from the docstore.json when the storage context is persisted. How should one handle this? Or is there a way LlamaIndex handles it automatically?
Any pointers regarding this would be helpful
Thank you