langchain-ai / langgraph

Build resilient language agents as graphs.
https://langchain-ai.github.io/langgraph/
MIT License
4.26k stars 639 forks source link

Local RAG agent with LLaMA3 error: Ollama call failed with status code 400. Details: {"error":"unexpected server status: 1"} #346

Open luca-git opened 2 months ago

luca-git commented 2 months ago

Checked other resources

Example Code

notebook example code in https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb

Error Message and Stack Trace (if applicable)

{'score': 'yes'}
According to the context, agent memory refers to a long-term memory module (external database) that records a comprehensive list of agents' experience in natural language. This memory stream is used by generative agents to enable them to behave conditioned on past experience and interact with other agents.
Traceback (most recent call last):
  File "/home/luca/pymaindir_icos/autocoders/lc_coder/lama3/local_llama3.py", line 139, in <module>
    answer_grader.invoke({"question": question,"generation": generation})
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2499, in invoke
    input = step.invoke(
            ^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 158, in invoke
    self.generate_prompt(
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 560, in generate_prompt
    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 421, in generate
    raise e
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 411, in generate
    self._generate_with_cache(
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 632, in _generate_with_cache
    result = self._generate(
             ^^^^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_community/chat_models/ollama.py", line 259, in _generate
    final_chunk = self._chat_stream_with_aggregation(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_community/chat_models/ollama.py", line 190, in _chat_stream_with_aggregation
    for stream_resp in self._create_chat_stream(messages, stop, **kwargs):
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_community/chat_models/ollama.py", line 162, in _create_chat_stream
    yield from self._create_stream(
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/luca/anaconda3/envs/lc_coder/lib/python3.11/site-packages/langchain_community/llms/ollama.py", line 251, in _create_stream
    raise ValueError(
ValueError: Ollama call failed with status code 400. Details: {"error":"unexpected server status: 1"}

Description

running the example code i get the above error, this is not happening with mistral so i guess my ollama is ok. I also get the first "yes" from the llama3 if I'm not mistaken, so I supsect it's related to something not working here:

from pprint import pprint
inputs = {"question": "What are the types of agent memory?"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"Finished running: {key}:")
pprint(value["generation"])

System Info

Ubuntu 22.04.4 LTS Anaconda and VSC.

Gyarados commented 2 months ago

I have been experiencing the same issue. It seems to happen at random when Ollama is requested. This thread suggests configuring a retry method: https://github.com/langchain-ai/langchain/issues/20773#issuecomment-2072117003 Seems to work, but would be nice to get an official fix.

matrodge commented 2 months ago

I'm seeing the same issue, the retry referenced here seems to help at times, but not 100%: https://github.com/langchain-ai/langchain/issues/20773#issuecomment-2072117003

MichlF commented 2 months ago

I can second this exactly:

I'm seeing the same issue, the retry referenced here seems to help at times, but not 100%: langchain-ai/langchain#20773 (comment)

luca-git commented 2 months ago

Workaround works (sometimes) after updating Ollama to 0.1.9

wdonno commented 2 months ago

Still an issue, not really functional on Ollama 0.1.32.

EDIT: resolved. Solved for me by ensuring other Ollama instances on the system (other Ubuntu instances under WSL or on host Windows machine were off (or uninstalled for the Windows version). Ollama may have a bug related to stopping the server.

hinthornw commented 2 months ago

This seems to be more of an Ollama issue in this case? Or is there something specific to this notebook that you want fixed

luca-git commented 2 months ago

It's a terrific notebook and I'd love to see it working with ollama and llama 3. I believe the issue affects every llama 3 implementation so fixing it would help greatly.