qingwli commented 1 week ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain.agents import (initialize_agent, AgentType,)
from langchain_ollama import ChatOllama
from langchain_community.utilities.sql_database import SQLDatabase
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit

llm = ChatOllama(
    model="llama3:latest", // I have test llama3 or qwen2.5:7b is the same 
    base_url="http://localhost:11434",
    timeout=300,
    temperature=0,
)

db = SQLDatabase.from_uri("mysql+pymysql://root@xxxx:xxxx/xxxx")
tools = SQLDatabaseToolkit(db=db, llm=llm).get_tools()

agent = initialize_agent(
    tools=tools,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    llm=llm,
    handle_parsing_errors=True,
    verbose=True,
)

# execute SQL query
user_input = "execute select count(1) from `meeting_summary_daily`"
response = agent.run(user_input)
print(f"Assistant: {response}")

Error Message and Stack Trace (if applicable)

>Entering new AgentExecutor chain...
I think we need to start by checking if the table "meeting_summary_daily" exists in the database. Let's use sql_db_list_tables for that.

Action: sql_db_list_tables
Action Input: (empty string
Observation: wap_meeting_summary_daily
Thought:Let's continue!

Thought: Now that we know the table exists, let's check if our query is correct before executing it. We'll use sql_db_query_checker for this.

Action: sql_db_query_checker
Action Input: select count(1) from `meeting_summary_daily
Observation: The original query is:

SELECT COUNT(1) FROM `meeting_summary_daily`

This query does not contain any common mistakes. Therefore, the rewritten query remains the same:

SELECT COUNT(1) FROM `meeting_summary_daily`
Thought:Let's continue!

Thought: Now that we know our query is correct, let's execute it using sql_db_query.

Action: sql_db_query
Action Input: select count(1) from `meeting_summary_daily
Observation: Error: (pymysql.err.ProgrammingError) (1064, "Getting syntax error at line 1, column 43. Detail message: No viable statement for input 'from '.")
[SQL: select count(1) from `meeting_summary_daily]

Description

Action: sql_db_query
Action Input: select count(1) from `meeting_summary_daily

You can see the output about Action Input, there is no last ` .

And I test other tools.

Here is the tools

def sql_query_tool(json):
    print("\n\njson: ", json)

sql_tool = Tool(
    name="sql_query_tool",
    func=sql_query_tool,
    description="""
      save sql query tools, can save sql and set frequency via user input
      needs to pass the input text for json model to tools
      has below fields:
        sql: user input sql
        frequency: like every 5 minutes
   """,
)

# execute SQL query
user_input = "save sql query, my sql is select count(1) from `table` and frequency is every 3 mins;"
response = agent.run(user_input)
print(f"Assistant: {response}")

output:

> Entering new AgentExecutor chain...
I need to use the `sql_query_tool` to save the SQL query with the specified frequency.
Action: sql_query_tool
Action Input: {"sql": "select count(1) from `table`", "frequency": "every 3 minutes

json:  {"sql": "select count(1) from `table`", "frequency": "every 3 minutes

You can see the Action Input lost the " and }.

System Info

python -m langchain_core.sys_info                 ✭

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000
> Python Version:  3.10.11 (v3.10.11:7d4cc5aa85, Apr  4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)]

Package Information
-------------------
> langchain_core: 0.3.10
> langchain: 0.3.3
> langchain_community: 0.3.2
> langsmith: 0.1.134
> langchain_huggingface: 0.1.0
> langchain_ollama: 0.2.0
> langchain_openai: 0.2.2
> langchain_text_splitters: 0.3.0
> langgraph: 0.2.35
> langserve: 0.3.0

avyrodov commented 1 week ago

A similar problem.

format37 commented 3 days ago

Since it is not fixed yet, the issue can be avoided by using another tool and agent initialization approach:

from langchain_ollama import ChatOllama
from langchain.tools import Tool
from langchain_community.tools import StructuredTool
from langchain.agents import initialize_agent, AgentType
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema import HumanMessage, AIMessage
from io import StringIO
from contextlib import redirect_stdout
from pydantic import BaseModel, Field
from typing import List
import asyncio

class TextFileReaderArgs(BaseModel):
    file_list: List[int] = Field(description="List of file IDs")

class AddToolArgs(BaseModel):
    a: int = Field(description="First number")
    b: int = Field(description="Second number")

async def text_file_reader(file_list: List[str]) -> str:
    # Your asynchronous code to read text files
    print(f"# text_file_reader request.\ntype: {type(file_list)}\nfile_list: {file_list}")
    return "These files contains these numbers: '298376837456\n658498465213546'"

async def add_tool(a: int, b: int) -> int:
    print(f"add_tool request: a: {a}, b: {b}")
    return a + b

async def conversation():
    add_tool_object = StructuredTool.from_function(
        coroutine=add_tool,
        name="add_two_numbers",
        description="Add two numbers",
        args_schema=AddToolArgs,
    )
    text_file_reader_tool = StructuredTool.from_function(
        coroutine=text_file_reader,
        name="read_text_file",
        # description = 'Read files from list of ids in format "[id1, id2, ...]\n\n". Input should end with 2 new lines.',
        description = 'Read files from list of ids in format "[id1, id2, ...]".',
        args_schema=TextFileReaderArgs,
    )
    tools = []
    tools.append(add_tool_object)
    tools.append(text_file_reader_tool)
    prompt = ChatPromptTemplate.from_messages(
            [
                ("system", "{system_prompt}"),
                ("placeholder", "{chat_history}"),
                ("human", "{input}"),
                ("placeholder", "{agent_scratchpad}"),
            ]
        )
    llm = ChatOllama(model="qwen2.5-coder:7b-instruct")
    agent = create_tool_calling_agent(llm, tools, prompt)
    agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
    system_prompt = "You are helpful assistant."
    chat_history = []

    input = 'Please read files with ids: 12, 37 and tell what they contains'
    result = await agent_executor.ainvoke(
        {
            "input": input,
            "chat_history": chat_history,
            "system_prompt": system_prompt,
        }
    )
    chat_history.append(HumanMessage(content=input))
    print(f"<< {result['output']}")
    chat_history.append(AIMessage(content=result["output"]))

    input = "Now please, add these numbers"
    result = await agent_executor.ainvoke(
        {
            "input": input,
            "chat_history": chat_history,
            "system_prompt": system_prompt,
        }
    )
    chat_history.append(HumanMessage(content=input))
    print(f"<< {result['output']}")
    chat_history.append(AIMessage(content=result["output"]))
    # 65,879,684,205,1002 or 658796842051002

async def main():
    await conversation()

asyncio.run(main())

You can also try adding "Input should end with 2 new lines" in the tool description.

qingwli commented 3 days ago

Thanks @avyrodov @format37 @hippopond

Here is my update

#Use the following format
Use the following format (and put in 2 newlines after Action Input):

#Action Input: the input to the action
Action Input: the input to the action, should end with ("   ")

It's better. But sometimes (" ") will be added before action input. I don't know why.

> Entering new AgentExecutor chain...
Thought: Need to query how many records are in the table wap_meeting_summary_daily first.
Action: sql_db_query_checker
Action Input:   SELECT COUNT(*) FROM wap_meeting_summary_daily;

Observation:The provided SQL query is:

SELECT COUNT(*) FROM wap_meeting_summary_daily;

hippopond commented 2 days ago

Thanks @format37 @qingwli

1) Agreed that modifying the prompt is not ideal. Sometimes I have also seen it add " 2 " after the line - which is not wanted.

2) I'm going to withdraw my PR - since it's not ideal and does not address the root cause.

3) Currently in my investigation: When I use ChatOllama directly, it works fine. When I use AgentExecutor on top of ChatOllama - that's when it drops the last character if it is not alphanumeric.

4) I am trying to disable streaming:

llm = ChatOllama( model="llama3.1", base_url="http://localhost:11434", timeout=300, temperature=0, disable_streaming=True )

But it still streams. I have also changed "disable_streaming=True" to "streaming=False" ---- it still streams.

I suspect the streaming is missing the last chunk.

Any suggestion on how to turn off streaming will help me. Thanks everyone !

hippopond commented 2 days ago

Here is my investigation so far:

Simple ChatOllama: (the last character comes back no problem)

from langchain_ollama import ChatOllama

llm_x = ChatOllama( model="llama3.1", base_url="http://localhost:11434", timeout=300, temperature=0, )

user_input = "echo back with (customers)"

response = llm_x.invoke(user_input) print(response.content)

output:

(customers)

################################################################################

Simple AgentExecutor on top of ChatOllama: (we miss the last character)

llm_y = ChatOllama( model="llama3.1", base_url="http://localhost:11434", timeout=300, temperature=0, )

from langchain.tools import Tool

def echo(text): """Echoes the input text.""" return text

echo_tool = Tool( name="Echo", func=echo, description="Repeats the input text." )

simple_agent = initialize_agent( tools=[echo_tool], agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, llm=llm_y, handle_parsing_errors=True, verbose=True, )

user_input = "echo back with (customers)" response = simple_agent.invoke(user_input)

output

Entering new AgentExecutor chain... Question: echo back with (customers) Thought: I need to use the Echo tool to repeat the text "(customers)" Action: Echo Action Input: (customers Observation: (customers Thought:Here are the steps:

..........

hippopond commented 2 days ago

The results from ChatOllama is fine.

The results from the SimpleExecutor on top of ChatOllama loses the last character if the last character is non - alphnumeric. See the line for Action Input

Both code paths wind up in :

Line 544 langchain/libs/partners/ollama/langchain_ollama/chat_models.py

Where the chunk for the last non alpha numeric character is missed.

qingwli commented 2 days ago

Hi @hippopond

It looks like we can 100% reproduce this issue and try to fix it in the source code. I'm new to Python. Are you referring this line?

https://github.com/langchain-ai/langchain/blob/31f4fb790d15ae1084a9b48615fe6fc54a4af4f8/libs/partners/ollama/langchain_ollama/chat_models.py#L544

hippopond commented 2 days ago

Yes.

There are 2 Paths to reach that line 544.

Here's the output of pdb - "where" command

The path where it is ChatOllama only:

-> response = llm_x.invoke(user_input) /langchain/libs/core/langchain_core/language_models/chat_models.py(286)invoke() -> self.generate_prompt( /langchain/libs/core/langchain_core/language_models/chat_models.py(786)generate_prompt() -> return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs) /langchain/libs/core/langchain_core/language_models/chat_models.py(633)generate() -> self._generate_with_cache( /langchain/libs/core/langchain_core/language_models/chat_models.py(851)_generate_with_cache() -> result = self._generate( /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(644)_generate() -> final_chunk = self._chat_stream_with_aggregation( /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(544)_chat_stream_with_aggregation() -> final_chunk = None

The Path where it is the AgentExecutor on top of ChatOllama

-> response = simple_agent.invoke(user_input) /langchain/libs/langchain/langchain/chains/base.py(160)invoke() -> self._call(inputs, run_manager=run_manager) /langchain/libs/langchain/langchain/agents/agent.py(1629)_call() -> next_step_output = self._take_next_step( /langchain/libs/langchain/langchain/agents/agent.py(1335)_take_next_step() -> [ /langchain/libs/langchain/langchain/agents/agent.py(1335)() -> [ /langchain/libs/langchain/langchain/agents/agent.py(1363)_iter_next_step() -> output = self._action_agent.plan( /langchain/libs/langchain/langchain/agents/agent.py(809)plan() -> full_output = self.llm_chain.predict(callbacks=callbacks, *full_inputs) /langchain/libs/langchain/langchain/chains/llm.py(318)predict() -> return self(kwargs, callbacks=callbacks)[self.output_key] /langchain/libs/core/langchain_core/_api/deprecation.py(182)warning_emitting_wrapper() -> return wrapped(args, kwargs) /langchain/libs/langchain/langchain/chains/base.py(389)call() -> return self.invoke( /langchain/libs/langchain/langchain/chains/base.py(160)invoke() -> self._call(inputs, run_manager=run_manager) /langchain/libs/langchain/langchain/chains/llm.py(126)_call() -> response = self.generate([inputs], run_manager=run_manager) /langchain/libs/langchain/langchain/chains/llm.py(138)generate() -> return self.llm.generate_prompt( /langchain/libs/core/langchain_core/language_models/chat_models.py(786)generate_prompt() -> return self.generate(prompt_messages, stop=stop, callbacks=callbacks, kwargs) /langchain/libs/core/langchain_core/language_models/chat_models.py(633)generate() -> self._generate_with_cache( /langchain/libs/core/langchain_core/language_models/chat_models.py(851)_generate_with_cache() -> result = self._generate( /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(644)_generate() -> final_chunk = self._chat_stream_with_aggregation( /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(544)_chat_stream_with_aggregation() -> final_chunk = None

hippopond commented 2 days ago

BOTH Paths lead to the same code.

So the thinking is: the extra steps taken by the AgentExecutor BEFORE reaching ChatOllama caused ChatOllama not to get the last chunk.

But I'm not sure how to find the bug that caused it.

avyrodov commented 1 day ago

@hippopond, Hi, I did some tests, they showed that the problem is probably not in the langchain implementation of ollama, if you try to listen to the response from ollama to a query (tcpflow -i any -c 'not port 11434'), you can see that the response already contains clipped information.

hippopond commented 1 day ago

Thank you @avyrodov

That's a good debugging step. (now I can use tcpflow in my toolkit in the future - Thanks ! ).

Your investigation aligns with what I see. The data, even the last chunk of "`" has arrived at LangChain.

Here's a small screenshot of what I see in LangSmith:

ONLY ChatOllama - The results are good:

LLMChain on top of ChatOllama - As confirmed by @avyrodov - the last chunk is in, but not going through code LLMChain-ChatOllama

hippopond commented 1 day ago

Cleaning up the Paths:

This is the code path ONLY ChatOllama (which gets the last chunk)

-> response = llm_x.invoke(user_input) /langchain/libs/core/langchain_core/language_models/chat_models.py(286)invoke() /langchain/libs/core/langchain_core/language_models/chat_models.py(786)generate_prompt() /langchain/libs/core/langchain_core/language_models/chat_models.py(633)generate() /langchain/libs/core/langchain_core/language_models/chat_models.py(851)_generate_with_cache() /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(644)_generate() /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(544)_chat_stream_with_aggregation()

This code path from the Simple Executor to the ChatOllama (which does NOT get the last chunk).

-> response = simple_agent.invoke(user_input) /langchain/libs/langchain/langchain/chains/base.py(160)invoke() /langchain/libs/langchain/langchain/agents/agent.py(1629)_call() /langchain/libs/langchain/langchain/agents/agent.py(1335)_take_next_step() /langchain/libs/langchain/langchain/agents/agent.py(1335)() /langchain/libs/langchain/langchain/agents/agent.py(1363)_iter_next_step() /langchain/libs/langchain/langchain/agents/agent.py(809)plan() /langchain/libs/langchain/langchain/chains/llm.py(318)predict() /langchain/libs/core/langchain_core/_api/deprecation.py(182)warning_emitting_wrapper() /langchain/libs/langchain/langchain/chains/base.py(389)call() /langchain/libs/langchain/langchain/chains/base.py(160)invoke() /langchain/libs/langchain/langchain/chains/llm.py(126)_call() /langchain/libs/langchain/langchain/chains/llm.py(138)generate() /langchain/libs/core/langchain_core/language_models/chat_models.py(786)generate_prompt() /langchain/libs/core/langchain_core/language_models/chat_models.py(633)generate() /langchain/libs/core/langchain_core/language_models/chat_models.py(851)_generate_with_cache() /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(644)_generate() /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(544)_chat_stream_with_aggregation()

hippopond commented 1 day ago

They both meet at:

/langchain/libs/core/langchain_core/language_models/chat_models.py(786)generate_prompt()

My next investigation step is to put a breakpoint at that point - and see the difference between all the variables.....

hippopond commented 5 hours ago

What I found so far.

1) There was no difference except the prompt at the breakpoint.

2) However, when I ran tcpflow:

- The last chunk "'" was returned by ollama server when the client was a simple ChatOllama object. 

- The last chunk was NOT returned by the ollama server when the client was a AgentExecutor on top of ChatOllama object.

3) I think the ollama server was initialized differently when the AgentExecutor got initialized

hippopond commented 3 hours ago

When i try to change the LLM model: (NOTE: llama3.4 does not exist)

llm_x = ChatOllama( model="llama3.4", base_url="http://localhost:11434", timeout=300, temperature=0, )

user_input = "echo back with (customers)" response = llm_x.invoke(user_input)

I get the error:

ResponseError: model "llama3.4" not found, try pulling it first

This is NOT captured by: tcpflow -i any -c 'not port 11434'

Which other port could ollama be listening in ?

qingwli commented 1 hour ago

I have a question, Ollama is running on 11434, why we need to run tcpflow -i any -c 'not port 11434'

It means see all tcp logs expect 11434, But we want to Ollama logs right?

Why dont we use

tcpflow -i any -c 'port 11434'

qingwli commented 1 hour ago

from langchain.agents import initialize_agent, AgentType
from langchain_ollama import ChatOllama

llm_y = ChatOllama(
    model="qwen2.5:7b",
    base_url="http://localhost:11434",
    timeout=300,
    temperature=0,
)

from langchain.tools import Tool

def echo(text):
    """Echoes the input text."""
    return text

echo_tool = Tool(
    name="Echo",
    func=echo,
    description="Repeats the input text."
)

simple_agent = initialize_agent(
    tools=[echo_tool],
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    llm=llm_y,
    handle_parsing_errors=True,
    verbose=True,
)

user_input = "echo back with (cus.tomers,)"
response = simple_agent.invoke(user_input)

print(response)

Here is my test code. Almost the same as above. I add a , after cus.tomers. I dont know why agent think () is not the word. So I add a , at the end.]

Here is the output:

> Entering new AgentExecutor chain...
I need to repeat the text provided in the parentheses.
Action: Echo
Action Input: cus.tomers
Observation: cus.tomers
Thought:I now know the final answer.
Final Answer: cus.tomers

> Finished chain.
{'input': 'echo back with (cus.tomers,)', 'output': 'cus.tomers'}

And here is the tcp log.

127.000.000.001.62184-127.000.000.001.11434: POST /api/chat HTTP/1.1
Host: localhost:11434
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Type: application/json
Accept: application/json
User-Agent: ollama-python/0.3.3 (arm64 darwin) Python/3.10.11
Content-Length: 1220

127.000.000.001.51274-127.000.000.001.11434: {"model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Answer the following questions as best you can. You have access to the following tools:\n\nEcho(text) - Repeats the input text.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Echo]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: echo back with (cus.tomers,)\nThought:", "images": []}], "tools": [], "stream": true, "format": "", "options": {"mirostat": null, "mirostat_eta": null, "mirostat_tau": null, "num_ctx": null, "num_gpu": null, "num_thread": null, "num_predict": null, "repeat_last_n": null, "repeat_penalty": null, "temperature": 0.0, "seed": null, "stop": ["\nObservation:", "\n\tObservation:"], "tfs_z": null, "top_k": null, "top_p": null}, "keep_alive": null}

We can see the response:

127.000.000.001.11434-127.000.000.001.51274: 83
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.609284Z","message":{"role":"assistant","content":"\nAction"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 83
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.609284Z","message":{"role":"assistant","content":"\nAction"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 81
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.647547Z","message":{"role":"assistant","content":" Input"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 81
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.647547Z","message":{"role":"assistant","content":" Input"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 7c
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.686764Z","message":{"role":"assistant","content":":"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 7c
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.686764Z","message":{"role":"assistant","content":":"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 7f
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.725001Z","message":{"role":"assistant","content":" cus"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 7f
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.725001Z","message":{"role":"assistant","content":" cus"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 7d
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.763603Z","message":{"role":"assistant","content":".t"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 7d
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.763603Z","message":{"role":"assistant","content":".t"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 80
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.801778Z","message":{"role":"assistant","content":"omers"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 80
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.801778Z","message":{"role":"assistant","content":"omers"},"done":false}

127.000.000.001.11434-127.000.000.001.51274: 128
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.954253Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":3194240333,"load_duration":35662750,"prompt_eval_count":165,"prompt_eval_duration":2267979000,"eval_count":24,"eval_duration":885610000}

127.000.000.001.11434-127.000.000.001.51274: 128
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.954253Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":3194240333,"load_duration":35662750,"prompt_eval_count":165,"prompt_eval_duration":2267979000,"eval_count":24,"eval_duration":885610000}

There is no last ,

Now we can see the api request information. I will just use this message to chat with Ollama.

╰─$ ollama run qwen2.5:7b
>>>
... Answer the following questions as best you can. You have access to the follo
... wing tools:\n\nEcho(text) - Repeats the input text.\n\nUse the following for
... mat:\n\nQuestion: the input question you must answer\nThought: you should al
... ways think about what to do\nAction: the action to take, should be one of [E
... cho]\nAction Input: the input to the action\nObservation: the result of the
... action\n... (this Thought/Action/Action Input/Observation can repeat N times
... )\nThought: I now know the final answer\nFinal Answer: the final answer to t
... he original input question\n\nBegin!\n\nQuestion: echo back with (cus.tomers
... ,)\nThought:
Thought: The task is clear. I need to use the Echo tool to repeat the
given text exactly as it is.

Action: Echo
Action Input: cus.tomers,
Observation: cus.tomers,

Thought: I have completed the action successfully.
Final Answer: cus.tomers,

Looks like it is last ,.

So let's see two api request is any different.

From python agent

{"model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Answer the following questions as best you can. You have access to the following tools:\n\nEcho(text) - Repeats the input text.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Echo]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: echo back with (cus.tomers,)\nThought:", "images": []}], "tools": [], "stream": true, "format": "", "options": {"mirostat": null, "mirostat_eta": null, "mirostat_tau": null, "num_ctx": null, "num_gpu": null, "num_thread": null, "num_predict": null, "repeat_last_n": null, "repeat_penalty": null, "temperature": 0.0, "seed": null, "stop": ["\nObservation:", "\n\tObservation:"], "tfs_z": null, "top_k": null, "top_p": null}, "keep_alive": null}

From iTerm Ollama

{"model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Answer the following questions as best you can. You have access to the following tools:\n\nEcho(text) - Repeats the input text.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Echo]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: echo back with (cus.tomers,)\nThought:", "images": []}], "tools": [], "stream": true, "format": "", "options": {}, "keep_alive": null}

Looks like the messages is all the same. The only difference is options.

From python agent

"options": {"mirostat": null, "mirostat_eta": null, "mirostat_tau": null, "num_ctx": null, "num_gpu": null, "num_thread": null, "num_predict": null, "repeat_last_n": null, "repeat_penalty": null, "temperature": 0.0, "seed": null, "stop": ["\nObservation:", "\n\tObservation:"], "tfs_z": null, "top_k": null, "top_p": null}

From iTerm Ollama

"options": {}

I will use postman send two request and see what's happend.

qingwli commented 1 hour ago

Very clear, Looks like call chat API, some options will affect this ,

I will test these options, which option is the key

"options": {"mirostat": null, "mirostat_eta": null, "mirostat_tau": null, "num_ctx": null, "num_gpu": null, "num_thread": null, "num_predict": null, "repeat_last_n": null, "repeat_penalty": null, "temperature": 0.0, "seed": null, "stop": ["\nObservation:", "\n\tObservation:"], "tfs_z": null, "top_k": null, "top_p": null}

qingwli commented 1 hour ago

Very simple: just delete every single item in the options and test, Now I know which key affects the last ,

qingwli commented 1 hour ago

hippopond commented 1 hour ago

Interesting.

FYI: When I run

tcpflow -i any -c 'not port 11434'

I only get the lines related to port 11434

hippopond commented 1 hour ago

Currently all the options are not controlled by the application developer. it's handled by a lower level.

I don't know if we have a way to fine control it......

qingwli commented 58 minutes ago

hippopond commented 54 minutes ago

https://github.com/langchain-ai/langchain/blob/2cb39270ecd920adb93451c6edecc6e5b2efab30/libs/partners/ollama/langchain_ollama/chat_models.py#L515

hippopond commented 52 minutes ago

When we instantiate the ChatOllama class: (We don't use stop)

llm_y = ChatOllama( model="llama3.1", base_url="http://localhost:11434", timeout=300, temperature=0, )

But, I think when we add on AgentExecutor ---- it adds in the "stop". Because of the complex prompt

simple_agent = initialize_agent( tools=[echo_tool], agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, llm=llm_y, handle_parsing_errors=True, verbose=True, )

qingwli commented 38 minutes ago

Yeah, you can change to use AgentExecutor or LLMSingleActionAgent. Then add stop=["Observation:"].

    llm_agent = LLMSingleActionAgent(
        llm_chain=llm_chain,
        output_parser=output_parser,
        stop=["Observation:"],
        handle_parsing_errors=True,
        allowed_tools=tool_names,
    )

It will work fine.

But I don't know why adding \n in Observation will lose last chunk, LOL

hippopond commented 33 minutes ago

LOL. These LLM - very unpredictable....

I like your workaround: put in the "stop=["Observation:"]"

qingwli commented 28 minutes ago

Looks like add "stop=["Observation:"]" is the only way we can do It's better than changing the prompt like let the agent add 2 spaces. It's more unstable, sometime like you said will add '2' at the end.

It does not fix this issue totally but can avoid this issue , LOL.

FYI @format37 @avyrodov

hippopond commented 25 minutes ago

Just an FYI:

When I run a session, and do a:

netstat -ant | grep -i 127

I get a pair of:

tcp4 0 0 127.0.0.1.11434 127.0.0.1.62591 ESTABLISHED tcp4 0 0 127.0.0.1.62591 127.0.0.1.11434 ESTABLISHED

I have to do TWO different terminals:

tcpflow -i any -c 'not port 11434'

tcpflow -i any -c 'not port 62591'

The 2 different tcpflow captures:

1) 1 captures the prompt and response

2) the other captures the Model setup and loading.

qingwli commented 22 minutes ago

Interesting, I run tcpflow -i any -c 'port 11434' in mac, I can see everything

hippopond commented 21 minutes ago

hmmm ok. Thanks @qingwli !

qingwli commented 15 minutes ago

I can see '\nObservation' is in langchain code, But I think this question will be fixed in Ollama, I will raise another issue in ollama repo, and I will post the link later.

Thanks @hippopond @hippopond @format37 for you clues

langchain-ai / langchain

"Action Input" lost the last symbols when using AgentExecutor #27673

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

Simple ChatOllama: (the last character comes back no problem)

output:

Simple AgentExecutor on top of ChatOllama: (we miss the last character)

output

The path where it is ChatOllama only:

The Path where it is the AgentExecutor on top of ChatOllama

This is the code path ONLY ChatOllama (which gets the last chunk)

This code path from the Simple Executor to the ChatOllama (which does NOT get the last chunk).