[Fix KeyError: 'documents' in web search function] langgraph_rag_agent_llama3_local.ipynb is not working for the web search

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangGraph/LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangGraph/LangChain rather than my code.
[X] I am sure this is better as an issue rather than a GitHub discussion, since this is a LangGraph bug and not a design question.

https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb

[Sep 22 2024 Comments: ] It looks like the sample documentation is correct. Just the github version is incorrect.

https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag_local

Example Code

def web_search(state):
    """
    Web search based based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to documents
    """

    print("---WEB SEARCH---")
    question = state["question"]

    ##### HERE #####
    documents = state["documents"]
    ################

    # Web search
    docs = web_search_tool.invoke({"query": question})
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)
    if documents is not None:
        documents.append(web_results)
    else:
        documents = [web_results]
    return {"documents": documents, "question": question}

Error Message and Stack Trace (if applicable)

from pprint import pprint

# Compile
app = workflow.compile()
inputs = {"question": "Who are the Bears expected to draft first in the NFL draft?"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"Finished running: {key}:")
pprint(value["generation"])

---ROUTE QUESTION---
Who are the Bears expected to draft first in the NFL draft?
{'datasource': 'web_search'}
web_search
---ROUTE QUESTION TO WEB SEARCH---
---WEB SEARCH---
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[16], line 6
      4 app = workflow.compile()
      5 inputs = {"question": "Who are the Bears expected to draft first in the NFL draft?"}
----> 6 for output in app.stream(inputs):
      7     for key, value in output.items():
      8         pprint(f"Finished running: {key}:")

File ~/miniconda3/envs/multi/lib/python3.11/site-packages/langgraph/pregel/__init__.py:1278, in Pregel.stream(self, input, config, stream_mode, output_keys, interrupt_before, interrupt_after, debug, subgraphs)
   1267     # Similarly to Bulk Synchronous Parallel / Pregel model
   1268     # computation proceeds in steps, while there are channel updates
   1269     # channel updates from step N are only visible in step N+1
   1270     # channels are guaranteed to be immutable for the duration of the step,
   1271     # with channel updates applied only at the transition between steps
   1272     while loop.tick(
   1273         input_keys=self.input_channels,
   1274         interrupt_before=interrupt_before_,
   1275         interrupt_after=interrupt_after_,
   1276         manager=run_manager,
   1277     ):
-> 1278         for _ in runner.tick(
   1279             loop.tasks.values(),
   1280             timeout=self.step_timeout,
   1281             retry_policy=self.retry_policy,
   1282             get_waiter=get_waiter,
   1283         ):
   1284             # emit output
   1285             yield from output()
   1286 # emit output

File ~/miniconda3/envs/multi/lib/python3.11/site-packages/langgraph/pregel/runner.py:52, in PregelRunner.tick(self, tasks, reraise, timeout, retry_policy, get_waiter)
     50 t = tasks[0]
     51 try:
---> 52     run_with_retry(t, retry_policy)
     53     self.commit(t, None)
     54 except Exception as exc:

File ~/miniconda3/envs/multi/lib/python3.11/site-packages/langgraph/pregel/retry.py:29, in run_with_retry(task, retry_policy)
     27 task.writes.clear()
     28 # run the task
---> 29 task.proc.invoke(task.input, config)
     30 # if successful, end
     31 break

File ~/miniconda3/envs/multi/lib/python3.11/site-packages/langgraph/utils/runnable.py:385, in RunnableSeq.invoke(self, input, config, **kwargs)
    383 context.run(_set_config_context, config)
    384 if i == 0:
--> 385     input = context.run(step.invoke, input, config, **kwargs)
    386 else:
    387     input = context.run(step.invoke, input, config)

File ~/miniconda3/envs/multi/lib/python3.11/site-packages/langgraph/utils/runnable.py:167, in RunnableCallable.invoke(self, input, config, **kwargs)
    165 else:
    166     context.run(_set_config_context, config)
--> 167     ret = context.run(self.func, input, **kwargs)
    168 if isinstance(ret, Runnable) and self.recurse:
    169     return ret.invoke(input, config)

Cell In[13], line 121, in web_search(state)
    118 print("---WEB SEARCH---")
    120 question = state["question"]
--> 121 documents = state["documents"]
    123 # Web search
    124 docs = web_search_tool.invoke({"query": question})

KeyError: 'documents'

Description

I encountered a KeyError: 'documents' in the web search function of this notebook. This error occurs when trying to access the 'documents' key from the state dictionary, which may not always exist. Proper error handling is needed to make the function more robust.

Current Behavior

The function assumes that state["documents"] always exists, leading to a KeyError when it doesn't.

Expected Behavior

The function should handle cases where state["documents"] doesn't exist, initializing it as an empty list if necessary.

Proposed Solution

Modify the code to use state.get("documents", []) instead of directly accessing state["documents"]. This change will return an empty list if the 'documents' key doesn't exist, preventing the KeyError.

Code Changes

# Before
documents = state["documents"]

# After
documents = state.get("documents", [])

# Rest of the function
docs = web_search_tool.invoke({"query": question})
web_results = "\n".join([d["content"] for d in docs])
web_results = Document(page_content=web_results)
documents.append(web_results)

return {"documents": documents, "question": question}

Would it be acceptable for me to create a Pull Request with these changes?

System Info

➜ python -m langchain_core.sys_info

System Information

OS: Darwin OS Version: Darwin Kernel Version 22.3.0: Mon Jan 30 20:39:35 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T8103 Python Version: 3.11.9 (main, Apr 19 2024, 11:43:47) [Clang 14.0.6 ]

Package Information

langchain_core: 0.3.5 langchain: 0.3.0 langchain_community: 0.3.0 langsmith: 0.1.125 langchain_experimental: 0.3.0 langchain_nomic: 0.1.3 langchain_ollama: 0.2.0 langchain_text_splitters: 0.3.0 langchainhub: 0.1.21 langgraph: 0.2.23

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.10.5 async-timeout: Installed. No version info available. dataclasses-json: 0.6.7 httpx: 0.27.2 jsonpatch: 1.33 langgraph-checkpoint: 1.0.10 nomic: 3.1.2 numpy: 1.26.4 ollama: 0.3.3 orjson: 3.10.7 packaging: 24.1 pillow: 10.3.0 pydantic: 2.9.2 pydantic-settings: 2.5.2 PyYAML: 6.0.2 requests: 2.32.3 SQLAlchemy: 2.0.35 tenacity: 8.5.0 types-requests: 2.32.0.20240914 typing-extensions: 4.12.2 (multi)

langchain-ai / langgraph