ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.3k stars 1.2k forks source link

How to implement path validity checking with ML models (GPT, LLM, etc) #4484

Closed AndyZe closed 6 days ago

AndyZe commented 1 week ago

Clearly an ML model could be quite useful in detecting a mud puddle, a cord, or a pit of hot lava which a robot shouldn't drive into, whereas Lidar and point clouds cannot distinguish any of those obstacles. Here's a quick example from ChatGPT. It identifies the cord obstacle correctly, then the lack of an obstacle. Many roboticists over the years would have killed to have functionality like this. (I think there are lots of other applications for ML when it comes to navigation, too.)

Screenshot from 2024-06-24 18-13-51

Related to PR #4483, it was suggested to move this check into the path validity server.

There are some details to work out. I think only a few meters in front of the robot should be checked for validity. It is not clear how to translate image to physical dimension. This will be more of a "fuzzy check." It would also be nice to allow different ML models to be swapped in/out.

However, I've worked with the ChatGPT API before and I think it will be quite easy, from that perspective.

AndyZe commented 1 week ago

@SteveMacenski you linked me to nav2_planner/src/planner_server.cpp:

void PlannerServer::isPathValid(
  const std::shared_ptr<nav2_msgs::srv::IsPathValid::Request> request,
  std::shared_ptr<nav2_msgs::srv::IsPathValid::Response> response)

Do you want me to add a similar but alternative function for the ML model version of isPathValid(), or ... ?

https://github.com/ros-navigation/navigation2/commit/8cfe20f110cf5dfbeb49e75e220369f9069787e1

AndyZe commented 1 week ago

^Moot question at this point since I changed the PR to focus on a vicinity check. There's no path element involved now.

SteveMacenski commented 1 week ago

Neat. I know there's alot of spots that Vision-AI and Gen-AI can be thrown into Nav2 but we haven't done any core integrations yet on Vision-AI since it would require retraining pipelines for everyone's individual situations. That's a big effort that we just haven't gotten to yet - though adding AI into Nav2 is easy with the plugin interfaces with even modest levels of creativity and understanding the framework's intent. We have some work on adding AI detector / segmentation models to the costmaps, but those contributors haven't totally gotten those over the line.

Gen-AI is interesting because the model is pretty general in the first place, so as long as the vendor of the Gen-AI is made generic through interfaces, folks can swap in what they like. Maybe not the best for safety critical, but definitely worth some tutorial value in "how-to" and for smaller, more intrinsically safe robot systems.

There are some details to work out. I think only a few meters in front of the robot should be checked for validity. It is not clear how to translate image to physical dimension. This will be more of a "fuzzy check."

What do you suggest? If you do some kind of vicinity model, how do you plan to make that actionable except just stop indefinitely?

I'm thinking possibly part of the "rules" for use is that you're cropping the image sent to be just the area you care about so that its restricted in scope. Perhaps you could make some statement about how it should be used such that the image patch sent represents right in front of the robot or something so that if ever "yes" that triggers a soft stop. Maybe just sets in the costmap a line in front of the robot as "do not pass" for a replan to handle. Its approximate, but so is the output you're working with.

We have some thoughts on how to translate into physical dimensions for semantic segmentation: using the known camera frame and a pinghole model, you can raytrace it into free space if you assume a flat world model. Not super accurate especially when not in flat world, but something like this is pretty imprecise to begin with (and a segmentation model would be most appropriate for accuracy, if accuracy was the goal). Another possible way.

Do you want me to add a similar but alternative function for the ML model version of isPathValid(), or ... ?

There's no reason that the server needs to be hosted inside of the Planner Server I don't think for the ML model. See below:


The PR seems not super well designed. I would ask for some more general designed solution before including something of this sort into Nav2.

It seems to me that you should probably create a standalone ROS-OpenAI node(s) that can take in text prompts and/or text prompts with images. This should probably expose a service that takes in the prompts (and/or has a image subscription itself) and returns the result for use. Then that service client can be used here or in any other number of places across the system that a LLM/GenAI-based Q&A would be helpful and not have to support N different GenAI models and rather just swap out the server with your implementation of choice with the appropriate API.

AndyZe commented 1 week ago

What do you suggest? If you do some kind of vicinity model, how do you plan to make that actionable except just stop indefinitely?

This part is easy: you have a BT action recover however you see fit in your application.

I was aware that this was going to be a difficult conversation. It is not easy to envision how to incorporate many different AI models and the C++ framework of Nav2 isn't really suited to it.

So would you support a new Nav2 service type which sends a prompt to an external OpenAI server? I would think this:

string prompt
sensor_msgs/Image image
---
bool response
AndyZe commented 1 week ago

Another option would be a new ai_msgs repo to put these new, AI-specific service types in.

SteveMacenski commented 1 week ago

This part is easy: you have a BT action recover however you see fit in your application.

I mean... how are you going to do it then? I'm not sure its clear to me how it is that you handle that situation if the plan says go through the door and the GenAI condition says stop. Some information needs to be translated to the planner about the no-go nature of that direction or something. To introduce a condition like this, we need to think through the application problems involved.

It is not easy to envision how to incorporate many different AI models and the C++ framework of Nav2 isn't really suited to it.

I'm not sure what you mean. Costmap layers --> detection/tracking, segmentation setting costs; controller plugins --> reinforcement or imitation or otherwise learning trajectory planners; GenAI --> orchestration / deliberation layer creation and/or impacting behavior based on inputs. These all seem like relatively obvious places to insert AI into the framework once you have models you want to use for your system. I think its pretty commonly done among Nav2 users. Its a shame that we in Nav2 haven't added more of these as "built-in", but RL/Vision-AI need retraining for specific applications and I've not been prepared yet to tackle a nice batteries-included way of doing that and no one else has stepped up to do it. The Gen-AI deliberation layer stuff is unique in this regard as something I think we can support more directly, as long as its done with sufficient generalization or have them separated into a "demos/tutorial" package as a 'here's how one could do it...' as inspiration. The generalization of it if we use Services to call the AI is easy for this case.

The C++ part adds some level of annoyance, but also those should probably be standalone segments of sensor processing pipelines to get zero-copy access to the large & raw data and publish out the results to Nav2 to use. Its also very common for deployed applications to use the C++ interfaces for AI/ML libraries in production for integration and efficiency reasons. Python is really only for prototyping. I know for a fact that what I just said isn't universally true (at best 50/50), but I think that's broadly from a lack of experience within those organizations in optimizing deployed systems and/or unawareness that you can use the C++ interfaces for deploying and Python interfaces for development.

But anyway... those are some thoughts about how I see it - whether right or wrong

So would you support a new Nav2 service type which sends a prompt to an external OpenAI server? I would think this:

I think there should be a new project repository somewhere that has these node(s) + the _msgs package for interacting with them. We can then have a BT node that uses it! I don't think these need to live in Nav2, persay. I suppose they could, but this feels like something the community larger than Nav2 would get value from, so abstracting it from the start might be wise.

Now you're going to ask me where it should live, and I don't know now that ros-planning is dead, since that would be an obvious place for Moveit/Nav2 to collaborate over, I suppose ros-perception but I'm not totally convinced that we can get the permissions to open new projects again. It looks like I'm still a member with repository creation privileges in ros-perception, but I'd want to check with OSRF first to make sure they aren't putting their foot down on new projects in existing ros-* with even informal relationships with OSRF. We could stick it in ros-navigation if you like as a placeholder :shrug: I'm not overly sensitive about it as long as a few people have permissions on it and there's intention to maintain it (even if just running releases and merging PRs).

We also have nav2 auxiliary https://github.com/ros-navigation/navigation2_auxiliary and could create a nav2 experimental repository where the generalization and quality guidelines would be lower as "experimental". Or even using nav2 tutorials if this is mostly of tutorial value, but I expect that you're addressing this topic here due to a client project so its probably desired for more than tutorials (but correct me if I'm wrong and its a side project).

AndyZe commented 1 week ago

@brettpac was kind enough to set up the ai_msgs repo here and you're invited as an admin. I'll try to get an initial (very simple) LLM prompting PR up for review today!

https://github.com/robosoft-ai/ai_msgs

AndyZe commented 1 week ago

Don't feel obligated. I know you're a busy guy and we can find somebody else if we need to.

SteveMacenski commented 1 week ago

Not at all being curmudgeon-y, just asking the question for informational purposes: Is there a reason this BT node should live in Nav2 specifically? I don't have a problem with adding more general-purpose nodes into Nav2's BT library index for folks to use, but it isn't a requirement to be able to use your custom BT nodes (for example).

If we work out the kinks of the workflow for how this would be used in an application, zero issues with a BT nodes of this nature living in Nav2 (maybe I'll even spend a couple hours to write up a tutorial myself as demonstration). I'd just want to make sure if its included directly in Nav2, when someone asks how to solve problems with respect to it, we've done some moderate thinking about it and have a plan for someone to use it.

AndyZe commented 6 days ago

I'll be focused on getting the OpenAI server up so I'm not planning to make a Nav2 PR any longer.

SteveMacenski commented 6 days ago

OK - closing then! Link back with the work, I'd love to see it and see how we can use it :-)