DOC: WebResearchRetriever can work with DuckDuckGo search?

langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications

MIT License

92.47k stars 14.79k forks source link

llm = HuggingFaceHub(repo_id="mistralai/Mistral-7B-Instruct-v0.1", model_kwargs={"temperature": 0.01, "max_length": 4096, "max_new_tokens": 2048}) # Vectorstore vectorstore = Chroma( embedding_function=HuggingFaceEmbeddings( model_name="all-MiniLM-L6-v2"), persist_directory="./chroma_db_oai" ) search = DuckDuckGoSearchAPIWrapper(max_results=max_num_results, region="jp-ja", time="d") user_input = "Which are the most demanded jobs for foreigner people that don't speak Japanese?" web_research_retriever = WebResearchRetriever.from_llm( vectorstore=vectorstore, llm=llm, search=search)

--------------------------------------------------------------------------- ValidationError Traceback (most recent call last) Cell In[44], line 15 12 search = DuckDuckGoSearchAPIWrapper() 13 user_input = "Which are the most demanded jobs for foreigner people that don't speak Japanese?" ---> 15 web_research_retriever = WebResearchRetriever.from_llm( 16 vectorstore=vectorstore, 17 llm=llm, 18 search=search) 20 # Run 21 docs = web_research_retriever.get_relevant_documents(user_input) File ~/anaconda3/envs/jobharbor/lib/python3.10/site-packages/langchain/retrievers/web_research.py:130, in WebResearchRetriever.from_llm(cls, vectorstore, llm, search, prompt, num_search_results, text_splitter) 123 # Use chat model prompt 124 llm_chain = LLMChain( 125 llm=llm, 126 prompt=prompt, 127 output_parser=QuestionListOutputParser(), 128 ) --> 130 return cls( 131 vectorstore=vectorstore, 132 llm_chain=llm_chain, 133 search=search, 134 num_search_results=num_search_results, 135 text_splitter=text_splitter, 136 ) File ~/anaconda3/envs/jobharbor/lib/python3.10/site-packages/langchain_core/load/serializable.py:97, in Serializable.__init__(self, **kwargs) 96 def __init__(self, **kwargs: Any) -> None: ---> 97 super().__init__(**kwargs) 98 self._lc_kwargs = kwargs File ~/anaconda3/envs/jobharbor/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.__init__(__pydantic_self__, **data) 339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data) 340 if validation_error: --> 341 raise validation_error 342 try: 343 object_setattr(__pydantic_self__, '__dict__', values) ValidationError: 6 validation errors for WebResearchRetriever search -> backend extra fields not permitted (type=value_error.extra) search -> max_results extra fields not permitted (type=value_error.extra) search -> region extra fields not permitted (type=value_error.extra) search -> safesearch extra fields not permitted (type=value_error.extra) search -> source extra fields not permitted (type=value_error.extra) search -> time extra fields not permitted (type=value_error.extra)

🤖

Hello @lfoppiano! Good to see you again. Hope you're doing well and coding is going smoothly. Let's dive into this issue you're having.

Based on the context provided, it appears that the WebResearchRetriever in LangChain is currently designed to work with the GoogleSearchAPIWrapper as the search engine. This is evident from the method signature of WebResearchRetriever.from_llm:

@classmethod
def from_llm(
    cls,
    vectorstore: VectorStore,
    llm: BaseLLM,
    search: GoogleSearchAPIWrapper,
    prompt: Optional[BasePromptTemplate] = None,
    num_search_results: int = 1,
    text_splitter: RecursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(
        chunk_size=1500, chunk_overlap=150
    ),
) -> "WebResearchRetriever":

Here, search is expected to be an instance of GoogleSearchAPIWrapper. This is likely the reason why you're encountering a ValidationError when trying to use DuckDuckGoSearchAPIWrapper as the search engine.

While LangChain does have a DuckDuckGoSearchAPIWrapper class, it seems that it's not currently compatible with the WebResearchRetriever. This could be due to differences in the methods or data returned by the two search engine wrappers.

If you want to use DuckDuckGo as the search engine with WebResearchRetriever, it might require modifications to the WebResearchRetriever class or the creation of a new retriever class that is compatible with DuckDuckGoSearchAPIWrapper.

I hope this helps clarify the issue. If you have further questions or need more assistance, feel free to ask.

Sources:

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

langchain-ai / langchain

DOC: WebResearchRetriever can work with DuckDuckGo search? #14762

Issue with current documentation:

Idea or request for content:

Sources