langchain-ai / langchain-google

MIT License
74 stars 78 forks source link

[Feature Request] Passing exclude_from_indexes parameter to VectorSearchVectorStoreDatastore #298

Open faisalron opened 2 weeks ago

faisalron commented 2 weeks ago

Hi Everyone,

I am using VectorSearchVectorStoreDatastore as my vector store for building RAG system. Turns out that I can't put metadata that exceeds 1500 bytes to the vector store. I think it's because the limitation of the Datastore used as the backend.

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "The value of property "text_content" is longer than 1500 bytes."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:xx:xx:xx:xx {created_time:"2024-06-12T05:25:31.891413694+00:00", grpc_status:3, grpc_message:"The value of property \"text_content\" is longer than 1500 bytes."}"
>
...

InvalidArgument: 400 The value of property "text_content" is longer than 1500 bytes. [detail: "/Datastore.Commit to [2002:ad3:d30e:0:b0:244:243f:63f8]:4001 : APP_ERROR(1) The value of property \"text_content\" is longer than 1500 bytes."

I think we can relax this limitation by passing exclude_from_indexes argument to the following line of code: https://github.com/langchain-ai/langchain-google/blob/main/libs/vertexai/langchain_google_vertexai/vectorstores/document_storage.py#L191

Reference Documentation: https://cloud.google.com/python/docs/reference/datastore/latest/client#entitykeynone-excludefromindexes

I can ingest the same data with VectorSearchVectorStore. But with latency constraint, it would be good to add this feature.

lkuligin commented 1 week ago

Good point! Would you like to work on this?

edge7 commented 6 days ago

Hi. I think there is not even need to make that one available as an option but just make it not indexable. At the end of the day the entire thing just works through id retrieving