[Feature Request] Passing exclude_from_indexes parameter to VectorSearchVectorStoreDatastore

faisalron commented 2 weeks ago

Hi Everyone,

I am using VectorSearchVectorStoreDatastore as my vector store for building RAG system. Turns out that I can't put metadata that exceeds 1500 bytes to the vector store. I think it's because the limitation of the Datastore used as the backend.

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "The value of property "text_content" is longer than 1500 bytes."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:xx:xx:xx:xx {created_time:"2024-06-12T05:25:31.891413694+00:00", grpc_status:3, grpc_message:"The value of property \"text_content\" is longer than 1500 bytes."}"
>
...

InvalidArgument: 400 The value of property "text_content" is longer than 1500 bytes. [detail: "/Datastore.Commit to [2002:ad3:d30e:0:b0:244:243f:63f8]:4001 : APP_ERROR(1) The value of property \"text_content\" is longer than 1500 bytes."

I think we can relax this limitation by passing exclude_from_indexes argument to the following line of code: https://github.com/langchain-ai/langchain-google/blob/main/libs/vertexai/langchain_google_vertexai/vectorstores/document_storage.py#L191

Reference Documentation: https://cloud.google.com/python/docs/reference/datastore/latest/client#entitykeynone-excludefromindexes

I can ingest the same data with VectorSearchVectorStore. But with latency constraint, it would be good to add this feature.

lkuligin commented 1 week ago

Good point! Would you like to work on this?

edge7 commented 6 days ago

Hi. I think there is not even need to make that one available as an option but just make it not indexable. At the end of the day the entire thing just works through id retrieving

langchain-ai / langchain-google

[Feature Request] Passing exclude_from_indexes parameter to VectorSearchVectorStoreDatastore #298