Recently, Google released layout parser for data stores: The layout parser is available only when using document chunking for RAG. When document chunking is turned on, Vertex AI Search breaks documents up into chunks at ingestion time and can return documents as chunks.
I created an unstructured data store with layout parsing enabled. I tried running the following snippet:
retriever = VertexAISearchRetriever(
project_id=PROJECT_ID,
location_id=DS_LOCATION_ID,
data_store_id=DATA_STORE_ID,
max_documents=10,
engine_data_type=DATA_STORE_TYPE, # set to 0 for unstructured data store
)
retriever_tool = create_retriever_tool(
retriever=retriever,
name=RETRIEVER_NAME,
description=RETRIEVER_DESCR
)
docs = retriever_tool.invoke({"query": question})
I receive the following error:
google.api_core.exceptions.InvalidArgument: 400extractive_content_specmust be not defined when the datastore is using 'chunking config' This might be due to engine_data_type not set correctly.
As of 17/05/2024, this feature is brand new, and the corresponding file was last modified 3 weeks ago, I assume the library has not tracked these changes yet.
I use version "1.0.3" of langchain-google-community.
Is there something I am missing here, or is my assumption correct?
Recently, Google released layout parser for data stores: The layout parser is available only when using document chunking for RAG. When document chunking is turned on, Vertex AI Search breaks documents up into chunks at ingestion time and can return documents as chunks.
I created an unstructured data store with layout parsing enabled. I tried running the following snippet:
I receive the following error:
google.api_core.exceptions.InvalidArgument: 400
extractive_content_specmust be not defined when the datastore is using 'chunking config' This might be due to engine_data_type not set correctly.
As of 17/05/2024, this feature is brand new, and the corresponding file was last modified 3 weeks ago, I assume the library has not tracked these changes yet.
I use version
"1.0.3"
oflangchain-google-community
.Is there something I am missing here, or is my assumption correct?