langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.47k stars 14.79k forks source link

DOC: WebResearchRetriever can work with DuckDuckGo search? #14762

Closed lfoppiano closed 9 months ago

lfoppiano commented 9 months ago

Issue with current documentation:

I've got this code:

llm = HuggingFaceHub(repo_id="mistralai/Mistral-7B-Instruct-v0.1",
                     model_kwargs={"temperature": 0.01, "max_length": 4096, "max_new_tokens": 2048})
# Vectorstore
vectorstore = Chroma(
    embedding_function=HuggingFaceEmbeddings(
        model_name="all-MiniLM-L6-v2"), persist_directory="./chroma_db_oai"
)

search = DuckDuckGoSearchAPIWrapper(max_results=max_num_results,
                                    region="jp-ja",
                                    time="d")

user_input = "Which are the most demanded jobs for foreigner people that don't speak Japanese?"

web_research_retriever = WebResearchRetriever.from_llm(
    vectorstore=vectorstore,
    llm=llm,
    search=search)

but I get this error:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[44], line 15
     12 search = DuckDuckGoSearchAPIWrapper()
     13 user_input = "Which are the most demanded jobs for foreigner people that don't speak Japanese?"
---> 15 web_research_retriever = WebResearchRetriever.from_llm(
     16     vectorstore=vectorstore,
     17     llm=llm,
     18     search=search)
     20 # Run
     21 docs = web_research_retriever.get_relevant_documents(user_input)

File ~/anaconda3/envs/jobharbor/lib/python3.10/site-packages/langchain/retrievers/web_research.py:130, in WebResearchRetriever.from_llm(cls, vectorstore, llm, search, prompt, num_search_results, text_splitter)
    123 # Use chat model prompt
    124 llm_chain = LLMChain(
    125     llm=llm,
    126     prompt=prompt,
    127     output_parser=QuestionListOutputParser(),
    128 )
--> 130 return cls(
    131     vectorstore=vectorstore,
    132     llm_chain=llm_chain,
    133     search=search,
    134     num_search_results=num_search_results,
    135     text_splitter=text_splitter,
    136 )

File ~/anaconda3/envs/jobharbor/lib/python3.10/site-packages/langchain_core/load/serializable.py:97, in Serializable.__init__(self, **kwargs)
     96 def __init__(self, **kwargs: Any) -> None:
---> 97     super().__init__(**kwargs)
     98     self._lc_kwargs = kwargs

File ~/anaconda3/envs/jobharbor/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.__init__(__pydantic_self__, **data)
    339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340 if validation_error:
--> 341     raise validation_error
    342 try:
    343     object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 6 validation errors for WebResearchRetriever
search -> backend
  extra fields not permitted (type=value_error.extra)
search -> max_results
  extra fields not permitted (type=value_error.extra)
search -> region
  extra fields not permitted (type=value_error.extra)
search -> safesearch
  extra fields not permitted (type=value_error.extra)
search -> source
  extra fields not permitted (type=value_error.extra)
search -> time
  extra fields not permitted (type=value_error.extra)

I wonder if the webresearch retriever works also with non-google search engines..

Thank you in advance

Idea or request for content:

No response

dosubot[bot] commented 9 months ago

🤖

Hello @lfoppiano! Good to see you again. Hope you're doing well and coding is going smoothly. Let's dive into this issue you're having.

Based on the context provided, it appears that the WebResearchRetriever in LangChain is currently designed to work with the GoogleSearchAPIWrapper as the search engine. This is evident from the method signature of WebResearchRetriever.from_llm:

@classmethod
def from_llm(
    cls,
    vectorstore: VectorStore,
    llm: BaseLLM,
    search: GoogleSearchAPIWrapper,
    prompt: Optional[BasePromptTemplate] = None,
    num_search_results: int = 1,
    text_splitter: RecursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(
        chunk_size=1500, chunk_overlap=150
    ),
) -> "WebResearchRetriever":

Here, search is expected to be an instance of GoogleSearchAPIWrapper. This is likely the reason why you're encountering a ValidationError when trying to use DuckDuckGoSearchAPIWrapper as the search engine.

While LangChain does have a DuckDuckGoSearchAPIWrapper class, it seems that it's not currently compatible with the WebResearchRetriever. This could be due to differences in the methods or data returned by the two search engine wrappers.

If you want to use DuckDuckGo as the search engine with WebResearchRetriever, it might require modifications to the WebResearchRetriever class or the creation of a new retriever class that is compatible with DuckDuckGoSearchAPIWrapper.

I hope this helps clarify the issue. If you have further questions or need more assistance, feel free to ask.

Sources:

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.