Closed KarlTheforest closed 2 months ago
We didn't have time to implement individual document deletion yet. As a workaround you can wipe your local_data
folder.
I will keep the issue open but keep an eye on the releases, we will add this feature soon most likely.
However, if you feel like contributing to the project you can do so:
ingest_service.py
: Implement the delete doc feature (by id).ingest_router.py
: add a new DELETE route and link it to the ingest_service
.ingest_service
.Another problem is that if something goes wrong during a folder ingestion (scripts/ingest_folder.py), (for example if parsing of an individual document fails), then running ingest_folder.py again does not check for documents already processed and ingests everything again from the beginning (probabaly the already processed documents are inserted twice)
Good point, that is also in the roadmap. Feel free to propose the improvement as a PR.
Another problem is that if something goes wrong during a folder ingestion (scripts/ingest_folder.py), (for example if parsing of an individual document fails), then running ingest_folder.py again does not check for documents already processed and ingests everything again from the beginning (probabaly the already processed documents are inserted twice)
The original implementation in the langchain is supposed to handle that for you, it'll only store if the source and stored vectors are not the same, but this leads to keeping out of date information afaik, and id not be surprised if that's handled too.
@lopagela is working on this at the moment
The PR have been merged: https://github.com/imartinez/privateGPT/pull/1163
close this thanks
From the title, how do I remove the pdf? Where is it located? inside privateGPT directory?