Closed Cavalletta98 closed 1 month ago
Thanks for the comment. Local models are high on the list of priorities right now.
Looking at the documentation for ChatOllama
, it does support tool calling. The method for binding tools also looks identical to what we're doing in ROSA.
Would you be willing to share your code, logs, output, or error messages so we can help diagnose?
Note: tried Ollama with Llama3.1 8B, with very poor results (see below). Going to test 70B model soon, hopefully with better results.
Interestingly, Llama3.1 8B does get close to working (see below). This might indicate that, while the model does choose the correct tool and parameters, it isn't responding with the correct output (e.g. format required by the LangChain AgentExecutor
).
Thanks for the test. I was testing the llama 3.1 8b but with custom tools different from the turtle example. I Just noticed that ,while with OpenAi the tools are correctly called,with llama 3.1 It answers with explaination without calling the tools. I think the problem Is the model and not ChatOllama that should support tools
from rosa import ROSA, RobotSystemPrompts
from langchain_ollama import ChatOllama
from langchain.agents import tool
import os
from rich.console import Console
from rich.markdown import Markdown
from rich.prompt import Prompt
from rich.text import Text
llm = ChatOllama(model="llama3.1",temperature=0.0)
@tool
def move_forward(distance: float) -> str:
"""
Move the robot forward by the specified distance.
:param distance: The distance to move the robot forward.
"""
# Your code here ...
print("I'm moving of ",distance)
return f"Moving forward by {distance} units."
prompts = RobotSystemPrompts(
embodiment_and_persona="You are a cool robot that can move forward."
)
rosa = rosa = ROSA(ros_version=2, llm=llm, tools=[move_forward], prompts=prompts)
console = Console()
greeting = Text(
"\nHi! I'm the ROSAI agent 🤖. How can I help you today?\n"
)
greeting.stylize("frame bold blue")
console.print(greeting)
while True:
user_input = Prompt.ask("Agent Chat")
if user_input == "exit":
console.print("Bye Bye")
break
output = rosa.invoke(user_input)
console.print(Markdown(output))
This is the piece of code that i'm currently testing and this is an output to a request
Thanks for the feedback.
I had a feeling that the number of base tools was too much for an 8b model, and I was correct. If you remove all but 4-5 tools, it actually works with the Llama3.1 8b (I tried with ros2 node list
, ros2 topic list
, and a custom move_forward
tool).
However, 4 or 5 tools is clearly not enough for a general purpose agent. I am currently testing Llama3.1 70b and will report back soon.
Either way, we'll provide a new, more intuitive interface for model selection, which will include Ollama.
Update: ROSA is working with Llama 3.1 8b!
Not sure why I didn't catch this the first time around, but the default behavior for Ollama with llama3.1:8b
is to set
the context size to 2K. Well, the number of tokens for base ROSA is closer to 4K.
It turns out that you can set the context size like so:
llm = ChatOllama(
model="llama3.1",
num_ctx=8192,
temperature=0.0,
)
Doing it this way, we do not need to remove any of the tools or make any modifications to the agent whatsoever. In addition, I tested Llama 3.1 70b, and it also works for ROSA. That said, inference time is significantly higher.
I wanted to compare performance between the two models, both for latency and quality of results. I tried this on a relatively capable machine with an RTX 4060. The discrepency is very likely due to the fact that the GPU only has 8GB of memory and therefore the 70b model experiences extremely high memory transfer time bottlenecks. All tests were performed using the TurtleSim demo (caveat: the TurtleSim demo has a very small list of topics, nodes, etc., YMMV)
Query: Reset the sim
Difficulty: Very Low
Model | 1st Tool Call | Final Response | Result |
---|---|---|---|
llama3.1:8b |
10s | 11.34s | Success |
llama3.1:70b |
1m55s | 2m52s | Success |
Query: Give me a list of nodes
Difficulty: Low
Model | 1st Tool Call | Final Response | Result |
---|---|---|---|
llama3.1:8b |
5.8s | 7s | Success |
llama3.1:70b |
2m16s | 2m52s | Success |
Query: Give me a list of nodes, topics, services, params, and log files
Difficulty: Medium
Model | 1st Tool Call | Final Response | Result |
---|---|---|---|
llama3.1:8b |
12.3s | 18.2s | Success |
llama3.1:70b |
1m56s | 11m32s | Success |
Query: Move forward 1 unit, then turn left 45 degrees, then move forward 2 units
Difficulty: High
Model | 1st Tool Call | Final Response | Result | Reason |
---|---|---|---|---|
llama3.1:8b |
8.24s | 10.37s | Fail | Failed to convert degrees to radians before turning |
llama3.1:70b |
5m18s | 6m24s | Fail | Incorrect sequence of tool calls |
Query: Draw a 5-point star
Difficulty: Very High
Model | 1st Tool Call | Final Response | Result | Reason |
---|---|---|---|---|
llama3.1:8b |
6.8s | 18.8s | Fail | Incorrect parameters used |
llama3.1:70b |
2m46s | 7m48s | Fail | Incorrect sequence of tool calls |
Note: This particular query results in several intermediate steps, each of which must happen in the correct order.
ChatOllama
when using it for agents. This is a LangChain limitation.For most applications (especially using only the core ROSA
class without custom tools), the 8b model is likely to be
sufficient. If you need higher accuracy, you can use the 70b model,
but be prepared for significantly higher latency. If you need to use the 70b model, you may need to consider a more
powerful GPU (A6000 or higher) or a dedicated device with global memory (e.g. Jetson AGX Orin).
[!Important] Make sure you set
temperature=0.0
andnum_ctx >= 8192
for both models.
Hi @RobRoyce thank you su much for the update. Have you planned to compare with other models to understand if the performance will be better than llama? I'm asking because could be interesting to understand if there is a model that performs well with ROSA base tools and other tools that a user could add. Just tu understand if there is a limitation on the number of tools that a user can add and if it is related to a model
Yes - I've already checked
Yes - and alternatives don't suffice
No
Hi, i'm trying to use Ollama with local LLM's with ROSA. I already tested ROSA with OpenAI an it works very well. With Ollama seems that i does not call the tools. Have you in plan to integrate ROSA with Ollama and local LLM's?