Closed daveckw closed 1 year ago
Hi, I faced with the same issue. I wonder if you were using GPTVectorStoreIndex
, because I found out that the method GPTVectorStoreIndex._delete
only deletes the doc_id of it's index_struct and vector_store.
def _delete(self, doc_id: str, **delete_kwargs: Any) -> None:
"""Delete a document."""
self._index_struct.delete(doc_id)
self._vector_store.delete(doc_id)
I think it also should delete it on docstore
def _delete(self, doc_id: str, **delete_kwargs: Any) -> None:
"""Delete a document."""
self._docstore.delete_document(doc_id)
self._index_struct.delete(doc_id)
self._vector_store.delete(doc_id)
Am I right? Should I make a PR about this?
This should be fixed, see this page for detailed guide/usage!
https://gpt-index.readthedocs.io/en/latest/how_to/index_structs/document_management.html
https://gpt-index.readthedocs.io/en/latest/how_to/index_structs/document_management.html
I get 404 Not found for this page.
https://gpt-index.readthedocs.io/en/latest/how_to/index_structs/document_management.html
I get 404 Not found for this page.
The docs were recently refactored, here's the updated link:
https://gpt-index.readthedocs.io/en/latest/how_to/index/document_management.html
Thanks for the updated link @bhanson-techempower
Since there is an option to delete from docstore in the documentation (its false by default because many indexes can share the same docstore. Even with it false, delete will stop it from being used in queries as the doc_id is removed from the index_struct)
, I'm going to close this for now, feel free to reopen though!
index.delete(doc_id) only deletes the "doc_id_dict", the actual document is not deleted.
{ "index_struct": { "__type__": "simple_dict", "__data__": { "index_id": "e8e73055-3108-404b-af48-ce36988caaca", "summary": null, "nodes_dict": { "076604ba-4cfb-4f39-83e8-980934292b47": "076604ba-4cfb-4f39-83e8-980934292b47", "26b07426-fd7d-4d02-be40-5659672d790b": "26b07426-fd7d-4d02-be40-5659672d790b", "a329d897-5577-4603-bd63-fb949b7c8312": "a329d897-5577-4603-bd63-fb949b7c8312" }, "doc_id_dict": { "doc_id_Hugoz Project2.txt": [ "076604ba-4cfb-4f39-83e8-980934292b47" ], "doc_id_IQI Exsim Policy2.txt": [ "26b07426-fd7d-4d02-be40-5659672d790b" ], "doc_id_Dave Chong2.txt": [ "a329d897-5577-4603-bd63-fb949b7c8312" ] }, "embeddings_dict": {} } }, "docstore": { "docs": { "076604ba-4cfb-4f39-83e8-980934292b47": { "text": "Hugoz KLCC Project Information:\n\nLaunch Date: APDL expected Q1 2023\nLand Area: 0.867 acres\nNumber of Blocks: 1 Tower\n\nTotal : 674 units\nNumber of Units\nNon HDA units : 354 units\nHDA units : 320 units\n\nNumber of Floors: 46 levels\nFreehold / Leasehold: Freehold\n\nCompletion Date: 48 months from A
For example, for I use index.delete(doc_id_Hugoz Project2.txt), the doc - 076604ba-4cfb-4f39-83e8-980934292b47, is still there.
May I know how to delete all the related docs from the docstore? Thank you.