ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.45k stars 1.24k forks source link

Design the World Model #565

Closed orduno closed 4 years ago

orduno commented 5 years ago

Background

What is the world model?

The robot's image or mental model of the world.

Jay Wright Forrested defined a mental model as:

The image of the world around us, which we carry in our head, is just a model. Nobody in his head imagines all the world, government or country. He has only selected concepts, and relationships between them, and uses those to represent the real system.

Navigation decisions and actions are made based on this model.

As shown below, the world model is populated with information coming from sensing, perception, and mapping; and supplies information to the navigation sub-modules.

overview

In order to guide our design decisions on the World Model, let's take a closer look at what are the various kinds of inputs and outputs.

Inputs to World Model

Let's consider the modules that provide information to the world model

Perception

The perception module provides input to the world model mainly to account for changes in the environment from both moving objects but also stationary objects with dynamic attributes, i.e. a traffic light.

Currently, moving objects are mostly accounted for by the obstacle layer of costmap_2d which process the raw output of a laser scanner.

Design Improvements

Maps & Map Server

The map server provides a priori information of the environment, mostly of stationary objects, in the form of a map. Maps can also contain dynamic information about some of these objects, i.e. traffic, road closures, etc.

Currently, the map server is only capable of processing and providing grid/cell-based (metric) types of map representations.

Design Improvements

Related issues: #18

Open Questions

Outputs from World Model

Let's consider the consumers of the information contained in the world model, aka the clients.

Clients operate on different length-scales and use different layers or aspects of the world model.

(Global) Path Planning

Can operate on a road network, topology map, or global map (sub-sampled occupancy grid or k-d tree).

These are coupled with the map representation being used.

Currently, only planners that operate on a costmap are supported.

Design Improvements

Open Questions

(Local Path Planning) Obstacle Avoidance and Control

Operates on a higher resolution local map representation, for example, an occupancy grid.

Attempts to follow the global path while correcting for obstacles in a dynamic environment. Provides the control inputs to the robot.

These are planner-dependent.

Currently, nav2 provides a DWA-based controller, nav2_dwb_controller. This has its own internal representation of the world (nav2_costmap_2d) with direct access to raw sensor data.

Design Improvements

Open Questions

Motion Primitives & Recovery

Currently, motion primitives do not interact with the world model. A pull-request (#516) is open that would add collision checking.

In ROS1 recovery, both the global and local costmap based representations were passed to the world model.

Design Improvements

Design

Goal

Design a world / environmental model for 2D navigation.

Objectives:

Summarizing the design improvements discussed above:

Proposal

Given the extension of the change, we'll have to implement the design in multiple phases.

In the first phase, we can separate the world model from the clients and make them separate nodes.

phase0

In the second phase, we can define the new modules and port the current costmap based world model. Below is a high-level diagram, the components are explained below. The main point of this phase is to remove the dependency between the core representation and the type of client, we do this by defining some plugins that translate the information of the Core into something useful to the client. Similarly, we also define plugins for the inputs.

proposal_overview

On the following phases, we can extend this by introducing other map formats (beyond grid-based maps) and perception pipelines. We also support multiple internal representations. Eventually, we might have something like this:

goal

Core Representation

The core representation is module rich enough to represent the world with enough expressiveness for at least doing navigation. In an ideal case, this could be an internal simulator where we can ask anything about the world. By querying this internal simulator, we can build a structure needed by a navigation sub-module.

We might want to experiment with different types of core representations with different levels of expressiveness. We can initially use costmaps but eventually move to scene-graphs that support a semantic interface.

Additionally, multiple representations might be appropriate i.e. Robot-centric, World-centric.

Open Questions

Planner Plugin

The planner plugin extracts information from the core representation to create the structure needed by the planner.

Open Questions

Control / Collision Avoidance Plugin

The control plugin extracts information from the core to create a useful structure for a controller/local planner.

Open Questions

Map to Core Plugin

Gets the map from the server and populates the core.

Open Questions

Sensing to Core Plugin

Gets low level (sensor data streams) or high level (objects with meta-data) and populates / updates the core.

Open Questions

Performance

Concerns

Next Steps

Phase 0:

Phase 1: Grid-based core using costmap_2d.

SteveMacenski commented 5 years ago

On a high level I don't see anything here that I see as a show stopper, but it also seems like this design is trying to bite out alot of things to be generally and extendable to everybody. I might think about reducing scope to get something out there that works and is relatable to existing technologies to build off of.

Only bit of input I'd add is that in the "core representation" I believe that there is no single correct way of representing the space (costmap, grid map, traversability map, etc) and should instead be a combination of all of them - such that the costmap is really a vector of of these items stacked on top of each other. Different applications can utilize different representations as they like but the data filled in populates the costmap, and the traversability map, and the ... based on whatever the application warrants.

SteveMacenski commented 5 years ago

From talking about this again this week I had another thought which I'm sure others have thought about but maybe not in words:

We would like to not have to redo different planning and controlling algorithms into different ones for different representations of costmaps. Rather than trying to make something like templating everywhere and trying to make everyone happy and readable, I'm thinking a good thought would be to have adaptors.

For example right now we're going costmap -> planner. Now we might go costmap -> adaptor -> planner, similarly with traversability map -> adaptor -> planner. This lets us generalize all the new planning and control algorithms independent of the implementation of the world model (making those by itself is hard without having to deal with lots of templates and thinking about ramifications across multiple representations).

Then we have a finite set of adaptors we need to build to represent the different models we'd like (costmaps, traversability, elevation, etc) who's job is to take the value of the neighboring cells and apply the vehicle kinematics and dynamics to give back a response "ok" "not ok" "unknown" or other options. The example adaptor for a costmap would be to change the 0-255 to those return types. Elevation map would apply the vehicle dynamics or maximum gradients to return the same type of information.

Now all planners or controller that work for 1 will work for all as long as they can run with "ok" "not ok" "unknown" or other options and be extendable easily for sampling based planners by creating methods in the adaptor to get the value at a certain location in global frame coordinates which the adaptor could project into a costmap cell, voxel grid pose, or elevation gradient.

crdelsey commented 5 years ago

@SteveMacenski

adaptors ... who's job is to take the value of the neighboring cells and apply the vehicle kinematics and dynamics to give back a response "ok" "not ok" "unknown"

Are you envisioning the output of the adaptor in this example as a sort of simplified costmap where each cell is filled with a ok, not ok, or unknown value. Or are you thinking more that the client feeds a trajectory to the adaptor and the adaptor returns the evaluation of the trajectory as a ok, not ok, unknown value?

This pushes the knowledge of the vehicle dynamics to the world model (adaptors) instead of the algorithms. How confident are you that this is the right place for that knowledge?

I had thought of having input and output adaptors as well, but I was thinking it could result N^2 adaptors since we theoretically could want to convert from any representation to any other. I expect it wouldn't be so bad in practice since many conversions are probably not useful.

Based on your feedback, I was imagining something like this below. We have a collection of representations. Adaptors to convert incoming data to those representations as needed. Clients that either get data directly from the representation they need or from an adaptor if there is a mismatch in the representation provided and the type they need.

image

SteveMacenski commented 5 years ago

Thanks for making that diagram, I'm certainly a pretty lazy guy when it comes to visualizations :)

First off, I'm not sure why you have a separate pipeline for laser scans, that defeats the purpose of having a general purpose world model to take in arbitrary sensors to generate a view of the world. I'd recommend completely scrapping that, I don't see anything special about a laser scanner requiring that type of pipeline within the world model. If you'd like to use it as a safety sensor with zones, that should be up stream of this since that's not generalized and many robots today don't use them anymore or have a different sensor suite.

What I think should be happening is a similar way of how it's done today. There's a set of optional plugins to buffer arbitrary sensor information (scans, images, depth maps, radar, sonar, etc) when does stuff of interest for that sensor, and inserts it into the map. I'm not thinking greatly into how to generalize those plugins for different plugins. In practice, I dont think you'll find a way to generalize that. A costmap is binning a depth map for collision avoidance, while an elevation map will use them over time to generate an ellipsoid curve or something are very different operations on the same sensor data. Those plugins for buffering and inserting data into their representation are probably representation specific.

Looking on the other side however, we have our [representation A] which needs to be utilized by the local/global planners to navigate. In the case of the elevation map, the Z coordinate becomes meaningful so we can't just talk about 2D X-Y coordinates anymore. The planner or controller will say "give me the neighbors" and its then up to the adaptor to say "I'm an elevation map, therefore my neighbors are in a 3D 8-directional curves" or "I'm a costmap, I just need to give the cells on either side of me", and return that information to the planner or controller to do their will with an ask for more things as needed.

This lets us do 2D or 3D representations but allow the same algorithms for local and global planning to operate (and moreover, would work for drones as well). When the adaptors return their neighbors, its up to the adaptor's knowledge of the robot dynamics and mechanics to assign them some set of finite states that can be generalized across all representations. That might include OK, not OK, unknown, not recommended, etc but as long as all of the algorithms are built to work with those same finite states, then whatever the representation is, it'll work. And the adaptors can use the Robot class to get those relevant dynamics. For a diff drive robot, its just all OK like in navigation1, but for an ackermann car or an elevation map on a legged robot, that may not be valid

orduno commented 5 years ago

I think what you are refering to as adaptor is what I have as a plugin below. We can later discuss the best pattern but the idea is the same, this block's goal is to translate the information available on the Core Representation and express it in something the client needs.

proposal_overview
SteveMacenski commented 5 years ago

Yes that's what I'm thinking, with the only exception that the "Sensing and Perception" isn't 1 plugin, but also a series of plugins, but I think that may have just been represented that way for brevity.

Each representation will have an associated plugin that converts its values into the limited-enum types and is responsible for answering the query from the controller/planner "give me the neighbors" and "give me the value at (x,y,z)".

crdelsey commented 5 years ago

I'm not sure why you have a separate pipeline for laser scans

It was meant to represent a possibility. If a sensor could output data directly in representation format, it could talk directly to the model. But that's a stupid idea in context. I was playing with the idea that everything could be a ROS node; each representation could be a node and each adaptor as well.

I think what you are refering to as adaptor is what I have as a plugin below.

Do we want to be able to chain adaptors/plugins? As in, there is a plugin that provides data from the representation, but there is a second plugin that grabs the output of the first plugin and provides it in a different way

Each representation will have an associated plugin that converts its values into the limited-enum types and is responsible for answering the query from the controller/planner "give me the neighbors" and "give me the value at (x,y,z)".

So we'd need to figure out the queries that can be used by many algorithms. We'd then end up with an output plugin per class of algorithm, where a class could be graph search algorithms like A, Dijkstra, D etc.

SteveMacenski commented 5 years ago

Well at the end of the day all the graph search algorithms are going to ask for neighbors, the sampling based planners will ask for the result at a certain position, I'm not totally certain what optimization based planners will ask for, but that's overkill for 2D navigation as far as I know. If not, we can find out what thematically they ask for.

I think plugin API implementing neighbors, position in global frame (then it should internally find the value of its representation i.e. costmap would look for X-Y cell index and then the location in the array), and random sample to start with. We can always extend it if this is the way you also think makes sense. It was just a suggestion

SteveMacenski commented 5 years ago

(More thoughts) Grid Maps exists and does a number of things. I haven't done too much work with it myself but from what I've been reading it looks like everything costmap_2d does, this does as well. We might want to consider that as an option to completely replace costmap 2d for a ground up rebuild. Link for reference: https://github.com/ANYbotics/grid_map

orduno commented 4 years ago

Closing the issue, it seems we're sticking with the current costmap design for now.