langchain-ai / langchain-google

MIT License
106 stars 132 forks source link

RecordManager Bigquery #549

Open Freezaa9 opened 2 weeks ago

Freezaa9 commented 2 weeks ago

Hello,

We work in a full GCP environment with Vertex Search (matching engine as Vector DB). And one of the drawback of this solution is the need to keep a side index of what is currently in the DB and the complexity to manage update and deletion without creating duplicate documents.

I feel that implementing RecordManager for Bigquery would solve all this problem and allow to easily track what's in the vector db: https://python.langchain.com/docs/how_to/indexing/ https://api.python.langchain.com/en/latest/indexes/langchain.indexes.base.RecordManager.html

The langchain documentation say:

Only works with LangChain vectorstore's that support:
document addition by id (add_documents method with ids argument)
delete by id (delete method with ids argument)

So it is the case for Vertex Search (in streaming mode).

There is no RecordManager yet in GCP. A PR for firestore is ongoing (https://github.com/googleapis/langchain-google-firestore-python/pull/90) but I feel like BigQuery might be more suitable for this use case.

Happy to discuss the implementation and suitability :)

Freezaa9 commented 2 weeks ago

I just noticed that "delete" is not implemented in VectorSearchVectorStore https://github.com/langchain-ai/langchain-google/pull/331/files

lkuligin commented 2 weeks ago

@eliasecchig