Open qingwli opened 1 week ago
A similar problem.
Since it is not fixed yet, the issue can be avoided by using another tool and agent initialization approach:
from langchain_ollama import ChatOllama
from langchain.tools import Tool
from langchain_community.tools import StructuredTool
from langchain.agents import initialize_agent, AgentType
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema import HumanMessage, AIMessage
from io import StringIO
from contextlib import redirect_stdout
from pydantic import BaseModel, Field
from typing import List
import asyncio
class TextFileReaderArgs(BaseModel):
file_list: List[int] = Field(description="List of file IDs")
class AddToolArgs(BaseModel):
a: int = Field(description="First number")
b: int = Field(description="Second number")
async def text_file_reader(file_list: List[str]) -> str:
# Your asynchronous code to read text files
print(f"# text_file_reader request.\ntype: {type(file_list)}\nfile_list: {file_list}")
return "These files contains these numbers: '298376837456\n658498465213546'"
async def add_tool(a: int, b: int) -> int:
print(f"add_tool request: a: {a}, b: {b}")
return a + b
async def conversation():
add_tool_object = StructuredTool.from_function(
coroutine=add_tool,
name="add_two_numbers",
description="Add two numbers",
args_schema=AddToolArgs,
)
text_file_reader_tool = StructuredTool.from_function(
coroutine=text_file_reader,
name="read_text_file",
# description = 'Read files from list of ids in format "[id1, id2, ...]\n\n". Input should end with 2 new lines.',
description = 'Read files from list of ids in format "[id1, id2, ...]".',
args_schema=TextFileReaderArgs,
)
tools = []
tools.append(add_tool_object)
tools.append(text_file_reader_tool)
prompt = ChatPromptTemplate.from_messages(
[
("system", "{system_prompt}"),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
]
)
llm = ChatOllama(model="qwen2.5-coder:7b-instruct")
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
system_prompt = "You are helpful assistant."
chat_history = []
input = 'Please read files with ids: 12, 37 and tell what they contains'
result = await agent_executor.ainvoke(
{
"input": input,
"chat_history": chat_history,
"system_prompt": system_prompt,
}
)
chat_history.append(HumanMessage(content=input))
print(f"<< {result['output']}")
chat_history.append(AIMessage(content=result["output"]))
input = "Now please, add these numbers"
result = await agent_executor.ainvoke(
{
"input": input,
"chat_history": chat_history,
"system_prompt": system_prompt,
}
)
chat_history.append(HumanMessage(content=input))
print(f"<< {result['output']}")
chat_history.append(AIMessage(content=result["output"]))
# 65,879,684,205,1002 or 658796842051002
async def main():
await conversation()
asyncio.run(main())
You can also try adding "Input should end with 2 new lines" in the tool description.
Thanks @avyrodov @format37 @hippopond
Here is my update
#Use the following format
Use the following format (and put in 2 newlines after Action Input):
#Action Input: the input to the action
Action Input: the input to the action, should end with (" ")
It's better. But sometimes (" ") will be added before action input. I don't know why.
> Entering new AgentExecutor chain...
Thought: Need to query how many records are in the table wap_meeting_summary_daily first.
Action: sql_db_query_checker
Action Input: SELECT COUNT(*) FROM wap_meeting_summary_daily;
Observation:The provided SQL query is:
SELECT COUNT(*) FROM wap_meeting_summary_daily;
Thanks @format37 @qingwli
1) Agreed that modifying the prompt is not ideal. Sometimes I have also seen it add " 2 " after the line - which is not wanted.
2) I'm going to withdraw my PR - since it's not ideal and does not address the root cause.
3) Currently in my investigation: When I use ChatOllama directly, it works fine. When I use AgentExecutor on top of ChatOllama - that's when it drops the last character if it is not alphanumeric.
4) I am trying to disable streaming:
llm = ChatOllama( model="llama3.1", base_url="http://localhost:11434", timeout=300, temperature=0, disable_streaming=True )
But it still streams. I have also changed "disable_streaming=True" to "streaming=False" ---- it still streams.
I suspect the streaming is missing the last chunk.
Any suggestion on how to turn off streaming will help me. Thanks everyone !
Here is my investigation so far:
from langchain_ollama import ChatOllama
llm_x = ChatOllama( model="llama3.1", base_url="http://localhost:11434", timeout=300, temperature=0, )
user_input = "echo back with (customers)
"
response = llm_x.invoke(user_input) print(response.content)
(customers)
################################################################################
llm_y = ChatOllama( model="llama3.1", base_url="http://localhost:11434", timeout=300, temperature=0, )
from langchain.tools import Tool
def echo(text): """Echoes the input text.""" return text
echo_tool = Tool( name="Echo", func=echo, description="Repeats the input text." )
simple_agent = initialize_agent( tools=[echo_tool], agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, llm=llm_y, handle_parsing_errors=True, verbose=True, )
user_input = "echo back with (customers)
"
response = simple_agent.invoke(user_input)
Entering new AgentExecutor chain... Question: echo back with
(customers)
Thought: I need to use the Echo tool to repeat the text "(customers)" Action: Echo Action Input: (customers Observation: (customers Thought:Here are the steps:
..........
The results from ChatOllama is fine.
The results from the SimpleExecutor on top of ChatOllama loses the last character if the last character is non - alphnumeric. See the line for Action Input
Both code paths wind up in :
Line 544 langchain/libs/partners/ollama/langchain_ollama/chat_models.py
Where the chunk for the last non alpha numeric character is missed.
Hi @hippopond
It looks like we can 100% reproduce this issue and try to fix it in the source code. I'm new to Python. Are you referring this line?
Yes.
There are 2 Paths to reach that line 544.
Here's the output of pdb - "where" command
-> response = llm_x.invoke(user_input) /langchain/libs/core/langchain_core/language_models/chat_models.py(286)invoke() -> self.generate_prompt( /langchain/libs/core/langchain_core/language_models/chat_models.py(786)generate_prompt() -> return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs) /langchain/libs/core/langchain_core/language_models/chat_models.py(633)generate() -> self._generate_with_cache( /langchain/libs/core/langchain_core/language_models/chat_models.py(851)_generate_with_cache() -> result = self._generate( /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(644)_generate() -> final_chunk = self._chat_stream_with_aggregation( /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(544)_chat_stream_with_aggregation() -> final_chunk = None
-> response = simple_agent.invoke(user_input)
/langchain/libs/langchain/langchain/chains/base.py(160)invoke()
-> self._call(inputs, run_manager=run_manager)
/langchain/libs/langchain/langchain/agents/agent.py(1629)_call()
-> next_step_output = self._take_next_step(
/langchain/libs/langchain/langchain/agents/agent.py(1335)_take_next_step()
-> [
/langchain/libs/langchain/langchain/agents/agent.py(1335)
BOTH Paths lead to the same code.
So the thinking is: the extra steps taken by the AgentExecutor BEFORE reaching ChatOllama caused ChatOllama not to get the last chunk.
But I'm not sure how to find the bug that caused it.
@hippopond, Hi, I did some tests, they showed that the problem is probably not in the langchain implementation of ollama, if you try to listen to the response from ollama to a query (tcpflow -i any -c 'not port 11434'
), you can see that the response already contains clipped information.
Thank you @avyrodov
That's a good debugging step. (now I can use tcpflow in my toolkit in the future - Thanks ! ).
Your investigation aligns with what I see. The data, even the last chunk of "`" has arrived at LangChain.
Here's a small screenshot of what I see in LangSmith:
ONLY ChatOllama - The results are good:
LLMChain on top of ChatOllama - As confirmed by @avyrodov - the last chunk is in, but not going through code
Cleaning up the Paths:
-> response = llm_x.invoke(user_input) /langchain/libs/core/langchain_core/language_models/chat_models.py(286)invoke() /langchain/libs/core/langchain_core/language_models/chat_models.py(786)generate_prompt() /langchain/libs/core/langchain_core/language_models/chat_models.py(633)generate() /langchain/libs/core/langchain_core/language_models/chat_models.py(851)_generate_with_cache() /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(644)_generate() /langchain/libs/partners/ollama/langchain_ollama/chat_models.py(544)_chat_stream_with_aggregation()
-> response = simple_agent.invoke(user_input)
/langchain/libs/langchain/langchain/chains/base.py(160)invoke()
/langchain/libs/langchain/langchain/agents/agent.py(1629)_call()
/langchain/libs/langchain/langchain/agents/agent.py(1335)_take_next_step()
/langchain/libs/langchain/langchain/agents/agent.py(1335)
They both meet at:
/langchain/libs/core/langchain_core/language_models/chat_models.py(786)generate_prompt()
My next investigation step is to put a breakpoint at that point - and see the difference between all the variables.....
What I found so far.
1) There was no difference except the prompt at the breakpoint.
2) However, when I ran tcpflow:
- The last chunk "'" was returned by ollama server when the client was a simple ChatOllama object.
- The last chunk was NOT returned by the ollama server when the client was a AgentExecutor on top of ChatOllama object.
3) I think the ollama server was initialized differently when the AgentExecutor got initialized
When i try to change the LLM model: (NOTE: llama3.4 does not exist)
llm_x = ChatOllama( model="llama3.4", base_url="http://localhost:11434", timeout=300, temperature=0, )
user_input = "echo back with (customers)
"
response = llm_x.invoke(user_input)
I get the error:
ResponseError: model "llama3.4" not found, try pulling it first
This is NOT captured by: tcpflow -i any -c 'not port 11434'
Which other port could ollama be listening in ?
I have a question, Ollama is running on 11434, why we need to run
tcpflow -i any -c 'not port 11434'
It means see all tcp logs expect 11434, But we want to Ollama logs right?
Why dont we use
tcpflow -i any -c 'port 11434'
from langchain.agents import initialize_agent, AgentType
from langchain_ollama import ChatOllama
llm_y = ChatOllama(
model="qwen2.5:7b",
base_url="http://localhost:11434",
timeout=300,
temperature=0,
)
from langchain.tools import Tool
def echo(text):
"""Echoes the input text."""
return text
echo_tool = Tool(
name="Echo",
func=echo,
description="Repeats the input text."
)
simple_agent = initialize_agent(
tools=[echo_tool],
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
llm=llm_y,
handle_parsing_errors=True,
verbose=True,
)
user_input = "echo back with (cus.tomers,)"
response = simple_agent.invoke(user_input)
print(response)
Here is my test code. Almost the same as above. I add a ,
after cus.tomers. I dont know why agent think () is not the word. So I add a ,
at the end.]
Here is the output:
> Entering new AgentExecutor chain...
I need to repeat the text provided in the parentheses.
Action: Echo
Action Input: cus.tomers
Observation: cus.tomers
Thought:I now know the final answer.
Final Answer: cus.tomers
> Finished chain.
{'input': 'echo back with (cus.tomers,)', 'output': 'cus.tomers'}
And here is the tcp log.
127.000.000.001.62184-127.000.000.001.11434: POST /api/chat HTTP/1.1
Host: localhost:11434
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Type: application/json
Accept: application/json
User-Agent: ollama-python/0.3.3 (arm64 darwin) Python/3.10.11
Content-Length: 1220
127.000.000.001.51274-127.000.000.001.11434: {"model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Answer the following questions as best you can. You have access to the following tools:\n\nEcho(text) - Repeats the input text.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Echo]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: echo back with (cus.tomers,)\nThought:", "images": []}], "tools": [], "stream": true, "format": "", "options": {"mirostat": null, "mirostat_eta": null, "mirostat_tau": null, "num_ctx": null, "num_gpu": null, "num_thread": null, "num_predict": null, "repeat_last_n": null, "repeat_penalty": null, "temperature": 0.0, "seed": null, "stop": ["\nObservation:", "\n\tObservation:"], "tfs_z": null, "top_k": null, "top_p": null}, "keep_alive": null}
We can see the response:
127.000.000.001.11434-127.000.000.001.51274: 83
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.609284Z","message":{"role":"assistant","content":"\nAction"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 83
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.609284Z","message":{"role":"assistant","content":"\nAction"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 81
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.647547Z","message":{"role":"assistant","content":" Input"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 81
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.647547Z","message":{"role":"assistant","content":" Input"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 7c
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.686764Z","message":{"role":"assistant","content":":"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 7c
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.686764Z","message":{"role":"assistant","content":":"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 7f
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.725001Z","message":{"role":"assistant","content":" cus"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 7f
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.725001Z","message":{"role":"assistant","content":" cus"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 7d
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.763603Z","message":{"role":"assistant","content":".t"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 7d
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.763603Z","message":{"role":"assistant","content":".t"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 80
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.801778Z","message":{"role":"assistant","content":"omers"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 80
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.801778Z","message":{"role":"assistant","content":"omers"},"done":false}
127.000.000.001.11434-127.000.000.001.51274: 128
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.954253Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":3194240333,"load_duration":35662750,"prompt_eval_count":165,"prompt_eval_duration":2267979000,"eval_count":24,"eval_duration":885610000}
127.000.000.001.11434-127.000.000.001.51274: 128
{"model":"qwen2.5:7b","created_at":"2024-11-07T07:20:36.954253Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":3194240333,"load_duration":35662750,"prompt_eval_count":165,"prompt_eval_duration":2267979000,"eval_count":24,"eval_duration":885610000}
There is no last ,
Now we can see the api request information. I will just use this message to chat with Ollama.
╰─$ ollama run qwen2.5:7b
>>>
... Answer the following questions as best you can. You have access to the follo
... wing tools:\n\nEcho(text) - Repeats the input text.\n\nUse the following for
... mat:\n\nQuestion: the input question you must answer\nThought: you should al
... ways think about what to do\nAction: the action to take, should be one of [E
... cho]\nAction Input: the input to the action\nObservation: the result of the
... action\n... (this Thought/Action/Action Input/Observation can repeat N times
... )\nThought: I now know the final answer\nFinal Answer: the final answer to t
... he original input question\n\nBegin!\n\nQuestion: echo back with (cus.tomers
... ,)\nThought:
Thought: The task is clear. I need to use the Echo tool to repeat the
given text exactly as it is.
Action: Echo
Action Input: cus.tomers,
Observation: cus.tomers,
Thought: I have completed the action successfully.
Final Answer: cus.tomers,
Looks like it is last ,
.
So let's see two api request is any different.
From python agent
{"model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Answer the following questions as best you can. You have access to the following tools:\n\nEcho(text) - Repeats the input text.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Echo]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: echo back with (cus.tomers,)\nThought:", "images": []}], "tools": [], "stream": true, "format": "", "options": {"mirostat": null, "mirostat_eta": null, "mirostat_tau": null, "num_ctx": null, "num_gpu": null, "num_thread": null, "num_predict": null, "repeat_last_n": null, "repeat_penalty": null, "temperature": 0.0, "seed": null, "stop": ["\nObservation:", "\n\tObservation:"], "tfs_z": null, "top_k": null, "top_p": null}, "keep_alive": null}
From iTerm Ollama
{"model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Answer the following questions as best you can. You have access to the following tools:\n\nEcho(text) - Repeats the input text.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Echo]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: echo back with (cus.tomers,)\nThought:", "images": []}], "tools": [], "stream": true, "format": "", "options": {}, "keep_alive": null}
Looks like the messages is all the same. The only difference is options
.
From python agent
"options": {"mirostat": null, "mirostat_eta": null, "mirostat_tau": null, "num_ctx": null, "num_gpu": null, "num_thread": null, "num_predict": null, "repeat_last_n": null, "repeat_penalty": null, "temperature": 0.0, "seed": null, "stop": ["\nObservation:", "\n\tObservation:"], "tfs_z": null, "top_k": null, "top_p": null}
From iTerm Ollama
"options": {}
I will use postman send two request and see what's happend.
Very clear, Looks like call chat API, some options will affect this ,
I will test these options, which option is the key
"options": {"mirostat": null, "mirostat_eta": null, "mirostat_tau": null, "num_ctx": null, "num_gpu": null, "num_thread": null, "num_predict": null, "repeat_last_n": null, "repeat_penalty": null, "temperature": 0.0, "seed": null, "stop": ["\nObservation:", "\n\tObservation:"], "tfs_z": null, "top_k": null, "top_p": null}
Very simple: just delete every single item in the options and test, Now I know which key affects the last ,
Interesting.
FYI: When I run
tcpflow -i any -c 'not port 11434'
I only get the lines related to port 11434
Currently all the options are not controlled by the application developer. it's handled by a lower level.
I don't know if we have a way to fine control it......
When we instantiate the ChatOllama class: (We don't use stop)
llm_y = ChatOllama( model="llama3.1", base_url="http://localhost:11434", timeout=300, temperature=0, )
But, I think when we add on AgentExecutor ---- it adds in the "stop". Because of the complex prompt
simple_agent = initialize_agent( tools=[echo_tool], agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, llm=llm_y, handle_parsing_errors=True, verbose=True, )
Yeah, you can change to use AgentExecutor
or LLMSingleActionAgent
. Then add stop=["Observation:"].
llm_agent = LLMSingleActionAgent(
llm_chain=llm_chain,
output_parser=output_parser,
stop=["Observation:"],
handle_parsing_errors=True,
allowed_tools=tool_names,
)
It will work fine.
But I don't know why adding \n
in Observation will lose last chunk, LOL
LOL. These LLM - very unpredictable....
I like your workaround: put in the "stop=["Observation:"]"
Looks like add "stop=["Observation:"]" is the only way we can do It's better than changing the prompt like let the agent add 2 spaces. It's more unstable, sometime like you said will add '2' at the end.
It does not fix this issue totally but can avoid this issue , LOL.
FYI @format37 @avyrodov
Just an FYI:
When I run a session, and do a:
netstat -ant | grep -i 127
I get a pair of:
tcp4 0 0 127.0.0.1.11434 127.0.0.1.62591 ESTABLISHED tcp4 0 0 127.0.0.1.62591 127.0.0.1.11434 ESTABLISHED
I have to do TWO different terminals:
tcpflow -i any -c 'not port 11434'
tcpflow -i any -c 'not port 62591'
The 2 different tcpflow captures:
1) 1 captures the prompt and response
2) the other captures the Model setup and loading.
Interesting, I run tcpflow -i any -c 'port 11434' in mac, I can see everything
hmmm ok. Thanks @qingwli !
I can see '\nObservation' is in langchain code, But I think this question will be fixed in Ollama, I will raise another issue in ollama repo, and I will post the link later.
Thanks @hippopond @hippopond @format37 for you clues
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
Description
You can see the output about Action Input, there is no last ` .
And I test other tools.
Here is the tools
output:
You can see the Action Input lost the " and }.
System Info