langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
88.76k stars 13.96k forks source link

Different output from GoogleVertexAISearchRetriever #16416

Closed mhyeonsoo closed 2 months ago

mhyeonsoo commented 5 months ago

Checked other resources

Example Code

from langchain_community.retrievers import (
    GoogleVertexAISearchRetriever,
    GoogleCloudEnterpriseSearchRetriever
)
import time

PROJECT_ID = "my_project_id"
SEARCH_ENGINE_ID = "my_datastore_id"
LOCATION_ID = "global"

retriever = GoogleVertexAISearchRetriever(
    project_id=PROJECT_ID,
    data_store_id=SEARCH_ENGINE_ID,
    location_id=LOCATION_ID,
    max_documents=3,
    engine_data_type=1,
)

while 1:
    message = input("Type: ")
    print("input message: " + message)
    result = retriever.get_relevant_documents(message)
    for doc in result:
        print(doc)
    time.sleep(1)  # Add a delay between each request

Description

I tried to use GoogleVertexAISearchRetriever for RAG. However, the output from retriever.get_relevant_documents(message) and the output response from GCP console's Vertex AI app preview are different.

At VertexAI Console, I could see the ideal result with the 5 most relevant results, but I couldn't get any response with the langchain script.

System Info

langchain==0.1.2 langchain-community==0.0.14

Related Components

dosubot[bot] commented 5 months ago

🤖

Hey @mhyeonsoo, good to see you again. Hope you've been coding some cool stuff since our last chat! Let's dive into this issue you're experiencing.

The discrepancy you're experiencing between the output from the get_relevant_documents(message) method in the GoogleVertexAISearchRetriever class and the response in the GCP console's Vertex AI app preview could be due to a few reasons:

  1. Different Query Parameters: The get_relevant_documents(message) method in the GoogleVertexAISearchRetriever class uses several parameters to create the search request. These parameters include filter, get_extractive_answers, max_documents, max_extractive_answer_count, max_extractive_segment_count, query_expansion_condition, and spell_correction_mode. If these parameters are set differently in the LangChain script and the GCP console's Vertex AI app preview, the responses could be different.

  2. Different Data Processing: The get_relevant_documents(message) method processes the search results and converts them into a list of LangChain documents. This processing includes extracting specific fields from the search results and formatting them into a specific structure. If the GCP console's Vertex AI app preview processes the search results differently, the responses could be different.

  3. Different Error Handling: The get_relevant_documents(message) method handles errors by raising exceptions. If the GCP console's Vertex AI app preview handles errors differently, the responses could be different.

You can find the relevant code in the GoogleVertexAISearchRetriever class.

The default values for the parameters in the get_relevant_documents(message) method of the GoogleVertexAISearchRetriever class are as follows:

These defaults are set in the class definition of GoogleVertexAISearchRetriever in the same file.

If you're still experiencing issues, could you please provide more information about the specific differences you're seeing between the LangChain script and the GCP console's Vertex AI app preview? This could help in identifying the cause of the discrepancy.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.