Different output from GoogleVertexAISearchRetriever

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.

Example Code

from langchain_community.retrievers import (
    GoogleVertexAISearchRetriever,
    GoogleCloudEnterpriseSearchRetriever
)
import time

PROJECT_ID = "my_project_id"
SEARCH_ENGINE_ID = "my_datastore_id"
LOCATION_ID = "global"

retriever = GoogleVertexAISearchRetriever(
    project_id=PROJECT_ID,
    data_store_id=SEARCH_ENGINE_ID,
    location_id=LOCATION_ID,
    max_documents=3,
    engine_data_type=1,
)

while 1:
    message = input("Type: ")
    print("input message: " + message)
    result = retriever.get_relevant_documents(message)
    for doc in result:
        print(doc)
    time.sleep(1)  # Add a delay between each request

Description

I tried to use GoogleVertexAISearchRetriever for RAG. However, the output from retriever.get_relevant_documents(message) and the output response from GCP console's Vertex AI app preview are different.

At VertexAI Console, I could see the ideal result with the 5 most relevant results, but I couldn't get any response with the langchain script.

System Info

langchain==0.1.2 langchain-community==0.0.14

Related Components

[X] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[X] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[X] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

🤖

Hey @mhyeonsoo, good to see you again. Hope you've been coding some cool stuff since our last chat! Let's dive into this issue you're experiencing.

The discrepancy you're experiencing between the output from the get_relevant_documents(message) method in the GoogleVertexAISearchRetriever class and the response in the GCP console's Vertex AI app preview could be due to a few reasons:

Different Query Parameters: The get_relevant_documents(message) method in the GoogleVertexAISearchRetriever class uses several parameters to create the search request. These parameters include filter, get_extractive_answers, max_documents, max_extractive_answer_count, max_extractive_segment_count, query_expansion_condition, and spell_correction_mode. If these parameters are set differently in the LangChain script and the GCP console's Vertex AI app preview, the responses could be different.
Different Data Processing: The get_relevant_documents(message) method processes the search results and converts them into a list of LangChain documents. This processing includes extracting specific fields from the search results and formatting them into a specific structure. If the GCP console's Vertex AI app preview processes the search results differently, the responses could be different.
Different Error Handling: The get_relevant_documents(message) method handles errors by raising exceptions. If the GCP console's Vertex AI app preview handles errors differently, the responses could be different.

You can find the relevant code in the GoogleVertexAISearchRetriever class.

The default values for the parameters in the get_relevant_documents(message) method of the GoogleVertexAISearchRetriever class are as follows:

filter: None
get_extractive_answers: False
max_documents: 5
max_extractive_answer_count: 1
max_extractive_segment_count: 1
query_expansion_condition: 1
spell_correction_mode: 2

These defaults are set in the class definition of GoogleVertexAISearchRetriever in the same file.

If you're still experiencing issues, could you please provide more information about the specific differences you're seeing between the LangChain script and the GCP console's Vertex AI app preview? This could help in identifying the cause of the discrepancy.

Sources

libs/community/langchain_community/retrievers/google_vertex_ai_search.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

langchain-ai / langchain