langchain-ai / langchain-google

MIT License
74 stars 78 forks source link

VertexAISearchRetriever does not support unstructured data stores with chunking enabled. #229

Open akos-sch opened 1 month ago

akos-sch commented 1 month ago

Recently, Google released layout parser for data stores: The layout parser is available only when using document chunking for RAG. When document chunking is turned on, Vertex AI Search breaks documents up into chunks at ingestion time and can return documents as chunks.

I created an unstructured data store with layout parsing enabled. I tried running the following snippet:

retriever = VertexAISearchRetriever(
    project_id=PROJECT_ID,
    location_id=DS_LOCATION_ID,
    data_store_id=DATA_STORE_ID,
    max_documents=10,
    engine_data_type=DATA_STORE_TYPE, # set to 0 for unstructured data store
)

retriever_tool = create_retriever_tool(
    retriever=retriever, 
    name=RETRIEVER_NAME, 
    description=RETRIEVER_DESCR
)

docs = retriever_tool.invoke({"query": question})

I receive the following error: google.api_core.exceptions.InvalidArgument: 400extractive_content_specmust be not defined when the datastore is using 'chunking config' This might be due to engine_data_type not set correctly.

As of 17/05/2024, this feature is brand new, and the corresponding file was last modified 3 weeks ago, I assume the library has not tracked these changes yet.

I use version "1.0.3" of langchain-google-community.

Is there something I am missing here, or is my assumption correct?

lkuligin commented 1 month ago

yes, most probably support is lacking on the integration side. @jzaldi