langchain-ai / langchain-google

MIT License
74 stars 78 forks source link

GoogleSearchAPIWrapper errors or times out with batches #274

Open rdvhoorn opened 4 weeks ago

rdvhoorn commented 4 weeks ago

Hi! I was working with the GoogleSearchAPIWrapper for batched calls to the google search API. I was prototyping in a jupyter Notebook when I noticed an error when two batched calls are made without rerunning the search tool initialization.

Here is a minimal example that simply returns exactly what you would expect:

from langchain_google_community import GoogleSearchAPIWrapper
from langchain_core.tools import Tool

# Initialize tools to use
search = GoogleSearchAPIWrapper()
google_search_tool = Tool(
    name="google-search",
    description="Search Google for web results.",
    func=search.run,
)

google_search_tool.batch([{'__arg1': 'What is LangChain?'}, {'__arg1': 'What is an LLM?'}])

However, once you remove the initialization from the function calling, i.e. like this:

from langchain_google_community import GoogleSearchAPIWrapper
from langchain_core.tools import Tool

# Initialize tools to use
search = GoogleSearchAPIWrapper()
google_search_tool = Tool(
    name="google-search",
    description="Search Google for web results.",
    func=search.run,
)

And call google_search_tool.batch([{'__arg1': 'What is LangChain?'}, {'__arg1': 'What is an LLM?'}]) in a separate cell:

It will successfully run the first time you run the second cell. But once you rerun the second cell for a second time, without rerunning the initialization of the tool, you can get the following possible errors:

Option 1:

SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2559)

Option 2:

AttributeError: 'NoneType' object has no attribute 'read'

Option 3:

SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2559)

Option 4:

TimeoutError: The read operation timed out

This becomes highly problematic when you have 2 search calls in a single chain, such as:

chain = (
    RunnableParallel({"results": google_search_tool})
    | (lambda x: {'__arg1': "New search call: what is a CNN"})
    | RunnableParallel({"results": google_search_tool})
)

chain.batch([{'__arg1': 'What is LangChain?'}, {'__arg1': 'What is an LLM?'}])

Which results in the same errors as I mentioned above.

I have absolutely no idea how to fix/work around this issue. So, any help on what the cause of this error is, or how to solve it or work around it would be very helpful.

Versions: python 3.12.2 langchain_google_community: 1.0.3 langchain_core: 0.1.45

rdvhoorn commented 3 weeks ago

I just figured out that the following does not error in the same way. It is ugly, but it seems to work:

from langchain_google_community import GoogleSearchAPIWrapper
from langchain_core.runnables import RunnablePassthrough, RunnableParallel, RunnableLambda
from langchain_core.tools import Tool

# Initialize tools to use
search = GoogleSearchAPIWrapper()
google_search_tool = Tool(
    name="google-search",
    description="Search Google for web results.",
    func=search.run,
)

search2 = GoogleSearchAPIWrapper()
google_search_tool2 = Tool(
    name="google-search",
    description="Search Google for web results.",
    func=search2.run,
)

chain = (
    RunnableParallel({
        "search_results": google_search_tool,
        "input": RunnablePassthrough()
    })
    |
    RunnableParallel({
        "search_results": RunnableLambda(lambda x: x['input']) | google_search_tool2
    })
)

chain.batch([{'__arg1': 'What is LangChain?'}, {'__arg1': 'What is an LLM?'}])

Or more shorthand:

# Initialize tools to use
number_of_google_calls_in_chain = 2
wrappers = [GoogleSearchAPIWrapper() for _ in range(number_of_google_calls_in_chain)]
search_tools = [Tool(name="google-search", description="Search Google for web results.", func=wrapper.run) for wrapper in wrappers]

chain = (
    RunnableParallel({
        "search_results": search_tools[0],
        "input": RunnablePassthrough()
    }) 
    | RunnableParallel({
        "search_results": RunnableLambda(lambda x: x['input']) | search_tools[1]
    })
)
chain.batch([{'__arg1': 'What is LangChain?'}, {'__arg1': 'What is an LLM?'}])
lkuligin commented 3 weeks ago

but we probably should fix the wrapper too :)

rdvhoorn commented 3 weeks ago

If you give me an indication of where to look, or what it might be, i would be willing to look into it. But since there are like 5 different, not very descriptive, errors, im not burning my hands without some guidance xD