nasa-jpl / rosa

ROSA 🤖 is an AI Agent designed to interact with ROS1- and ROS2-based robotics systems using natural language queries. ROSA helps robot developers inspect, diagnose, understand, and operate robots.
https://github.com/nasa-jpl/rosa/wiki
Apache License 2.0
322 stars 28 forks source link

[New Feature]: Support for Ollama and local models #18

Open Cavalletta98 opened 2 weeks ago

Cavalletta98 commented 2 weeks ago

Have you checked for duplicate issue tickets?

Yes - I've already checked

Have you considered alternative solutions to your feature request?

Yes - and alternatives don't suffice

Is your feature request related to any problems? Please help us understand if so, including linking to any other issue tickets.

No

A clear and concise description of your request.

Hi, i'm trying to use Ollama with local LLM's with ROSA. I already tested ROSA with OpenAI an it works very well. With Ollama seems that i does not call the tools. Have you in plan to integrate ROSA with Ollama and local LLM's?

RobRoyce commented 2 weeks ago

Thanks for the comment. Local models are high on the list of priorities right now.

Looking at the documentation for ChatOllama, it does support tool calling. The method for binding tools also looks identical to what we're doing in ROSA.

Would you be willing to share your code, logs, output, or error messages so we can help diagnose?

RobRoyce commented 2 weeks ago

Note: tried Ollama with Llama3.1 8B, with very poor results (see below). Going to test 70B model soon, hopefully with better results.

image
RobRoyce commented 2 weeks ago

Interestingly, Llama3.1 8B does get close to working (see below). This might indicate that, while the model does choose the correct tool and parameters, it isn't responding with the correct output (e.g. format required by the LangChain AgentExecutor).

image
Cavalletta98 commented 2 weeks ago

Thanks for the test. I was testing the llama 3.1 8b but with custom tools different from the turtle example. I Just noticed that ,while with OpenAi the tools are correctly called,with llama 3.1 It answers with explaination without calling the tools. I think the problem Is the model and not ChatOllama that should support tools

Cavalletta98 commented 2 weeks ago
from rosa import ROSA, RobotSystemPrompts
from langchain_ollama import ChatOllama
from langchain.agents import tool
import os
from rich.console import Console
from rich.markdown import Markdown
from rich.prompt import Prompt
from rich.text import Text

llm = ChatOllama(model="llama3.1",temperature=0.0)

@tool
def move_forward(distance: float) -> str:
    """
    Move the robot forward by the specified distance.

    :param distance: The distance to move the robot forward.
    """
    # Your code here ...
    print("I'm moving of ",distance)
    return f"Moving forward by {distance} units."

prompts = RobotSystemPrompts(
    embodiment_and_persona="You are a cool robot that can move forward."
)

rosa = rosa = ROSA(ros_version=2, llm=llm, tools=[move_forward], prompts=prompts)

console = Console()
greeting = Text(
            "\nHi! I'm the ROSAI agent 🤖. How can I help you today?\n"
        )
greeting.stylize("frame bold blue")

console.print(greeting)
while True:
    user_input = Prompt.ask("Agent Chat")
    if user_input == "exit":
        console.print("Bye Bye")
        break
    output = rosa.invoke(user_input)
    console.print(Markdown(output))

This is the piece of code that i'm currently testing and this is an output to a request immagine

RobRoyce commented 2 weeks ago

Thanks for the feedback.

I had a feeling that the number of base tools was too much for an 8b model, and I was correct. If you remove all but 4-5 tools, it actually works with the Llama3.1 8b (I tried with ros2 node list, ros2 topic list, and a custom move_forward tool).

However, 4 or 5 tools is clearly not enough for a general purpose agent. I am currently testing Llama3.1 70b and will report back soon.

Either way, we'll provide a new, more intuitive interface for model selection, which will include Ollama.

RobRoyce commented 1 day ago

Update: ROSA is working with Llama 3.1 8b!

Not sure why I didn't catch this the first time around, but the default behavior for Ollama with llama3.1:8b is to set the context size to 2K. Well, the number of tokens for base ROSA is closer to 4K.

It turns out that you can set the context size like so:

llm = ChatOllama(
    model="llama3.1",
    num_ctx=8192,
    temperature=0.0,
)

Doing it this way, we do not need to remove any of the tools or make any modifications to the agent whatsoever. In addition, I tested Llama 3.1 70b, and it also works for ROSA. That said, inference time is significantly higher.


Benchmarks

I wanted to compare performance between the two models, both for latency and quality of results. I tried this on a relatively capable machine with an RTX 4060. The discrepency is very likely due to the fact that the GPU only has 8GB of memory and therefore the 70b model experiences extremely high memory transfer time bottlenecks. All tests were performed using the TurtleSim demo (caveat: the TurtleSim demo has a very small list of topics, nodes, etc., YMMV)

Query: Reset the sim
Difficulty: Very Low

Model 1st Tool Call Final Response Result
llama3.1:8b 10s 11.34s Success
llama3.1:70b 1m55s 2m52s Success

Query: Give me a list of nodes
Difficulty: Low

Model 1st Tool Call Final Response Result
llama3.1:8b 5.8s 7s Success
llama3.1:70b 2m16s 2m52s Success

Query: Give me a list of nodes, topics, services, params, and log files
Difficulty: Medium

Model 1st Tool Call Final Response Result
llama3.1:8b 12.3s 18.2s Success
llama3.1:70b 1m56s 11m32s Success

Query: Move forward 1 unit, then turn left 45 degrees, then move forward 2 units
Difficulty: High

Model 1st Tool Call Final Response Result Reason
llama3.1:8b 8.24s 10.37s Fail Failed to convert degrees to radians before turning
llama3.1:70b 5m18s 6m24s Fail Incorrect sequence of tool calls

Query: Draw a 5-point star
Difficulty: Very High

Model 1st Tool Call Final Response Result Reason
llama3.1:8b 6.8s 18.8s Fail Incorrect parameters used
llama3.1:70b 2m46s 7m48s Fail Incorrect sequence of tool calls

Note: This particular query results in several intermediate steps, each of which must happen in the correct order.

Notes

Conclusion

For most applications (especially using only the core ROSA class without custom tools), the 8b model is likely to be sufficient. If you need higher accuracy, you can use the 70b model, but be prepared for significantly higher latency. If you need to use the 70b model, you may need to consider a more powerful GPU (A6000 or higher) or a dedicated device with global memory (e.g. Jetson AGX Orin).

[!Important] Make sure you set temperature=0.0 and num_ctx >= 8192 for both models.