langchain-ai / langchain-google

MIT License
105 stars 123 forks source link

ValidationError in GoogleScholarAPIWrapper #468

Closed HamedHaddadi closed 1 month ago

HamedHaddadi commented 1 month ago

Issue: I recently noticed GoogleScholarAPIWrapper raises ValidationError (see below). This problem is recent and did not happen before.
ValidationError is raised with a simple instantiation of GoogleScholarAPIWrapper; either within GoogleScholarQueryRun or as an standalone class.
. SERP_API_KEY is already available in the environmental variables as you can see below. Or it can be a direct input to the GoogleScholarAPIWrapper as stated here. Both approached failed.

from dotenv import load_dotenv
from langchain_community.tools.google_scholar import GoogleScholarQueryRun 
from langchain_community.utilities.google_scholar import GoogleScholarAPIWrapper 
configs = '../configs/keys.env'
load_dotenv(configs)
tool = GoogleScholarQueryRun(api_wrapper=GoogleScholarAPIWrapper(hl = 'en', lr = 'lang_en', top_k_results = 10))

Error:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[19], line 1
----> 1 tool = GoogleScholarQueryRun(api_wrapper=GoogleScholarAPIWrapper(hl = 'en', lr = 'lang_en', top_k_results = 10))

File ~/miniforge3/envs/langchain_hh/lib/python3.11/site-packages/pydantic/v1/main.py:341, in BaseModel.__init__(__pydantic_self__, **data)
    339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340 if validation_error:
--> 341     raise validation_error
    342 try:
    343     object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 2 validation errors for GoogleScholarAPIWrapper
SERP_API_KEY
  extra fields not permitted (type=value_error.extra)
google_scholar_engine
  extra fields not permitted (type=value_error.extra)
lkuligin commented 1 month ago

GoogleScholarAPIWrapper is an unofficial third-party integration that we don't support.

HamedHaddadi commented 1 month ago

Ok thanks. The error seems to be caused by pydantic validator during class construction.

    @root_validator(pre=True)
    def validate_environment(cls, values: Dict) -> Dict:
        """Validate that api key and python package exists in environment."""
        serp_api_key = get_from_dict_or_env(values, "serp_api_key", "SERP_API_KEY")
        values["SERP_API_KEY"] = serp_api_key

        try:
            from serpapi import GoogleScholarSearch

        except ImportError:
            raise ImportError(
                "google-search-results is not installed. "
                "Please install it with `pip install google-search-results"
                ">=2.4.2`"
            )
        GoogleScholarSearch.SERP_API_KEY = serp_api_key
        values["google_scholar_engine"] = GoogleScholarSearch

        return values

I will try to reproduce the error using serpapi GoogleScholarSearch and post the outcome.

HamedHaddadi commented 1 month ago

I developed a custom tool by making small changes to the LangChain wrapper and query run . I put the answer here. first develop a wrapper for Serp API scholar search

from serpapi import GoogleScholarSearch
from langchain.pydantic_v1 import BaseModel, Field, root_validator 
from typing import Optional, Dict, Any

class ScholarSearch(BaseModel):
    top_k_results: int = Field(description = "top k results obtained by running a query on GoogleScholarSearch")
    sepr_api_key: Optional[str] = None
    search_engine: Optional[Any] = None

    @root_validator(pre = True)
    def validate_env(cls, values: Dict) -> Dict:
        serp_api_key = values.get('serp_api_key')
        if serp_api_key is None:
            serp_api_key = os.environ["SERP_API_KEY"]
        GoogleScholarSearch.SERP_API_KEY = serp_api_key 
        values['search_engine'] = GoogleScholarSearch
        return values

    def run(self, query: str) -> str:
        page = 0
        all_results = []
        inputs = {"q": query, "page": page, "hl": hl, "lr": lr, "num": 1}
        while page < max((self.top_k_results - 20), 1):
            results = (self.search_engine({"q": query, "start": page, "hl": "en",
                        "num": min( self.top_k_results, 20), "lr": "lang_en"}).get_dict().get("organic_results", []))
            all_results.extend(results)
            if not results:  
                break
            page += 20
        if (self.top_k_results % 20 != 0 and page > 20 and all_results):  # From the last page we would only need top_k_results%20 results
            results = (self.search_scholar_engine({"q": query,"start": page,"num": self.top_k_results % 20, "hl": "en", "lr": "lang_en"})
                .get_dict()
                .get("organic_results", []))
            all_results.extend(results)
        if not all_results:
            return "No good Google Scholar Result was found"
        docs = [
            f"Title: {result.get('title','')}\n"
            f"Authors: {','.join([author.get('name') for author in result.get('publication_info',{}).get('authors',[])])}\n"  # noqa: E501
            f"Summary: {result.get('publication_info',{}).get('summary','')}\n"
            f"Total-Citations: {result.get('inline_links',{}).get('cited_by',{}).get('total','')}"  # noqa: E501
            for result in all_results
        ]
        return "\n\n".join(docs)

use the ScholarSearch wrapper in the custom tool below:

from typing import Optional, Type
from langchain_core.tools import BaseTool 

class GoogleScholarTool(BaseTool):
    """
    Tool that requires google scholar search API
    """
    name: str = "google_scholar_tool"
    description: str = ("A wrapper around Google Scholar Search. "
        "Useful for when you need to get information about"
        "research papers from Google Scholar"
        "Input should be a search query.")
    api_wrapper: ScholarSearch 

    def _run(self, query: str) -> str:
        """
        Use the tool
        """
        return self.api_wrapper.run(query)

here is the result:

api_wrapper = ScholarSearch(top_k_results = 10, serp_api_key = None)
tool = GoogleScholarTool(api_wrapper = api_wrapper)
tool.invoke("find articles on finite inertia")

Results:

'Title: First order phase transition resulting from finite inertia in coupled oscillator systems\nAuthors: \nSummary: HA Tanaka, AJ Lichtenberg, S Oishi - Physical review letters, 1997 - APS\nTotal-Citations: 238\n\nTitle: On the collision rate of small particles in isotropic turbulence. II. Finite inertia case\nAuthors: AS Wexler,LP Wang\nSummary: Y Zhou, AS Wexler, LP Wang - Physics of fluids, 1998 - pubs.aip.org\nTotal-Citations: 163\n\nTitle: Complete synchronization of Kuramoto oscillators with finite inertia\nAuthors: YP Choi,SY Ha,SB Yun\nSummary: YP Choi, SY Ha, SB Yun - Physica D: Nonlinear Phenomena, 2011 - Elsevier\nTotal-Citations: 135\n\nTitle: Numerical study of filament suspensions at finite inertia\nAuthors: ME Rosti,L Brandt\nSummary: AA Banaei, ME Rosti, L Brandt - Journal of Fluid Mechanics, 2020 - cambridge.org\nTotal-Citations: 35\n\nTitle: Microstructure and rheology of finite inertia neutrally buoyant suspensions\nAuthors: H Haddadi,JF Morris\nSummary: H Haddadi, JF Morris - Journal of Fluid Mechanics, 2014 - cambridge.org\nTotal-Citations: 64\n\nTitle: Brownian motion of finite-inertia particles in a simple shear flow\nAuthors: \nSummary: Y Drossinos, MW Reeks - Physical Review E—Statistical, Nonlinear, and Soft …, 2005 - APS\nTotal-Citations: 38\n\nTitle: Motion of a spherical capsule in branched tube flow with finite inertia\nAuthors: Z Wang,Y Sui,AV Salsac,D Barthès-Biesel\nSummary: Z Wang, Y Sui, AV Salsac, D Barthès-Biesel… - Journal of Fluid …, 2016 - cambridge.org\nTotal-Citations: 54\n\nTitle: Pairwise interactions between deformable drops in free shear at finite inertia\nAuthors: RK Singh,K Sarkar\nSummary: PO Olapade, RK Singh, K Sarkar - Physics of Fluids, 2009 - pubs.aip.org\nTotal-Citations: 41\n\nTitle: Electroosmotic flows in microchannels with finite inertial and pressure forces\nAuthors: JG Santiago\nSummary: JG Santiago - Analytical chemistry, 2001 - ACS Publications\nTotal-Citations: 368\n\nTitle: Drop deformation and breakup in a vortex at finite inertia\nAuthors: K Sarkar\nSummary: X Li, K Sarkar - Journal of Fluid Mechanics, 2006 - cambridge.org\nTotal-Citations: 30'