We work in a full GCP environment with Vertex Search (matching engine as Vector DB).
And one of the drawback of this solution is the need to keep a side index of what is currently in the DB and the complexity to manage update and deletion without creating duplicate documents.
Only works with LangChain vectorstore's that support:
document addition by id (add_documents method with ids argument)
delete by id (delete method with ids argument)
So it is the case for Vertex Search (in streaming mode).
Hello,
We work in a full GCP environment with Vertex Search (matching engine as Vector DB). And one of the drawback of this solution is the need to keep a side index of what is currently in the DB and the complexity to manage update and deletion without creating duplicate documents.
I feel that implementing RecordManager for Bigquery would solve all this problem and allow to easily track what's in the vector db: https://python.langchain.com/docs/how_to/indexing/ https://api.python.langchain.com/en/latest/indexes/langchain.indexes.base.RecordManager.html
The langchain documentation say:
So it is the case for Vertex Search (in streaming mode).
There is no RecordManager yet in GCP. A PR for firestore is ongoing (https://github.com/googleapis/langchain-google-firestore-python/pull/90) but I feel like BigQuery might be more suitable for this use case.
Happy to discuss the implementation and suitability :)