Background

What is the world model?

The robot's image or mental model of the world.

Jay Wright Forrested defined a mental model as:

The image of the world around us, which we carry in our head, is just a model. Nobody in his head imagines all the world, government or country. He has only selected concepts, and relationships between them, and uses those to represent the real system.

Navigation decisions and actions are made based on this model.

As shown below, the world model is populated with information coming from sensing, perception, and mapping; and supplies information to the navigation sub-modules.

In order to guide our design decisions on the World Model, let's take a closer look at what are the various kinds of inputs and outputs.

Inputs to World Model

Let's consider the modules that provide information to the world model

Perception

The perception module provides input to the world model mainly to account for changes in the environment from both moving objects but also stationary objects with dynamic attributes, i.e. a traffic light.

Currently, moving objects are mostly accounted for by the obstacle layer of costmap_2d which process the raw output of a laser scanner.

Design Improvements

Ideally, we would encapsulate all sensor processing in the perception module.
World model would in some cases only receive higher-level descriptions of objects.

Maps & Map Server

The map server provides a priori information of the environment, mostly of stationary objects, in the form of a map. Maps can also contain dynamic information about some of these objects, i.e. traffic, road closures, etc.

Currently, the map server is only capable of processing and providing grid/cell-based (metric) types of map representations.

Design Improvements

Support additional map representations:
- Topological
- Multiple graphs for a world
- Robot can travel between nodes (assumption)
- (Virtual) Road network (for lane-based nav)
- Vector-based
- K-d Tree (Quad/Oct trees)
- Hybrid
Enable update of dynamic information
Enable servicing from multiple maps (stitching multiple maps)
Enable servicing sub-sections of maps

Related issues: #18

Open Questions

Map representation is to some extent planner-dependent, therefore compatible map representations and planners might need to be selected at launch time.
For incompatible types, do we support transformation between types?
Some generalization might be possible if we consider maps as graphs and use planners that search on graphs.

Outputs from World Model

Let's consider the consumers of the information contained in the world model, aka the clients.

Clients operate on different length-scales and use different layers or aspects of the world model.

(Global) Path Planning

Can operate on a road network, topology map, or global map (sub-sampled occupancy grid or k-d tree).

These are coupled with the map representation being used.

Currently, only planners that operate on a costmap are supported.

Design Improvements

Extend interface to support a wider range of planning paradigms. Here are some general families of planners:
- Potential Field
- Graph Search (Families of A, D and E*, Wavefront)
- At the lowest level, these use the world representation for obtaining the reachable nodes given the current, for collision checking and getting the cost of a cell (more on this below).
- Combinatorial / Roadmaps generation
- Visibility graph
- Voronoi diagrams
- Exact & Approximate Cell Decomposition
- Sampling-based
- Deterministic / Lattice-based (Hybrid-A, ARA, LARA*)
- Random (R*, PRM, RRT, EST, and variants)
- Optimal Control (LQR/G, iLQR, MPC)

Open Questions

For 2D navigation of a mobile platform, we could consider Combinatorial methods as graph construction techniques which output is then used by a graph search algorithm. We might want to consider this as a capability of the World Model, i.e. produce different types of graphs.
Adding the cost to a cell in the world model is used to encode some type of information, for example, to avoid possible collision given robot footprint or define a region preference. We might, however, want to handle this differently.
See Map Server section for some other questions related to the planner.

(Local Path Planning) Obstacle Avoidance and Control

Operates on a higher resolution local map representation, for example, an occupancy grid.

Attempts to follow the global path while correcting for obstacles in a dynamic environment. Provides the control inputs to the robot.

These are planner-dependent.

Currently, nav2 provides a DWA-based controller, nav2_dwb_controller. This has its own internal representation of the world (nav2_costmap_2d) with direct access to raw sensor data.

Design Improvements

Consolidate the world model. Create an interface in the World Model to support DWA.
- DWA uses the world model to check if a proposed trajectory is free of a collision.
Extend interface to support other local planners (#202).
- 5D Planning -- runs a graph search planner (A*) under the hood.
- Techniques used in Control Theory can be used to generate and follow local trajectories, i.e. Optimal Control (LQR/G, MPC).

Open Questions

Motion Primitives & Recovery

Currently, motion primitives do not interact with the world model. A pull-request (#516) is open that would add collision checking.

In ROS1 recovery, both the global and local costmap based representations were passed to the world model.

Design Improvements

Define the types of services needed by motion primitives, only for collision checking?
Define the types of services needed to support recovery, currently only identified clearing the structure (#406).

Design

Goal

Design a world / environmental model for 2D navigation.

Objectives:

Summarizing the design improvements discussed above:

Support different levels of navigation, i.e. unstructured (no specific rules or reference path / lane), structured (predefined rules), etc.
Add support for different types of planners and controllers.
Consolidate the world model. Define a clean interface. Avoid code replication.
Improve the integration of perception pipelines.
Create a design that can be extended to a multi-robot scenario with distributed parts of the world-model.

Proposal

Given the extension of the change, we'll have to implement the design in multiple phases.

In the first phase, we can separate the world model from the clients and make them separate nodes.

In the second phase, we can define the new modules and port the current costmap based world model. Below is a high-level diagram, the components are explained below. The main point of this phase is to remove the dependency between the core representation and the type of client, we do this by defining some plugins that translate the information of the Core into something useful to the client. Similarly, we also define plugins for the inputs.

On the following phases, we can extend this by introducing other map formats (beyond grid-based maps) and perception pipelines. We also support multiple internal representations. Eventually, we might have something like this:

Core Representation

The core representation is module rich enough to represent the world with enough expressiveness for at least doing navigation. In an ideal case, this could be an internal simulator where we can ask anything about the world. By querying this internal simulator, we can build a structure needed by a navigation sub-module.

We might want to experiment with different types of core representations with different levels of expressiveness. We can initially use costmaps but eventually move to scene-graphs that support a semantic interface.

Additionally, multiple representations might be appropriate i.e. Robot-centric, World-centric.

Open Questions

Single vs Multiple Representations.
- Robot-centric vs World-centric, Map/Client-dependant: costmap, topology, object-based, etc.
Support object-based representations? Such as ED or BRICS.
What other core representations are we interested in eventually supporting? Scene graph, costmap, etc.
Future support for probabilistic modeling?

Planner Plugin

The planner plugin extracts information from the core representation to create the structure needed by the planner.

Open Questions

We need to address the map-planner dependency issue. If the map server provides the map representation on a given format (i.e. elevation map) but the planner uses costmap, what component does the conversion?
Also, we might have different graph generation/representations, perhaps a graph-search planner could be generalized to work independently of the graph. However, there is a dependency on the robot dynamics e.g. when generating a lattice:
- 2D Grid
- Lattice (from motion discretization i.e. primitives)
- Cell Decomposition
- Topology (connectivity of places)
Topological maps. What are the use cases? As a replacement to metric map or addition? Annotated with information on how to navigate from one place to another? Is the navigation goal defined as a node on this map?

Control / Collision Avoidance Plugin

The control plugin extracts information from the core to create a useful structure for a controller/local planner.

Open Questions

What is the interface between the world model client and the controller?

Map to Core Plugin

Gets the map from the server and populates the core.

Open Questions

Map server to update in runtime the dynamic aspects of the map? Traffic given some external traffic monitoring system.
Perception could also update dynamic aspects?
(other questions?)

Sensing to Core Plugin

Gets low level (sensor data streams) or high level (objects with meta-data) and populates / updates the core.

Open Questions

(questions?)

Performance

Concerns

(add here)

Next Steps

Phase 0:

[ ] Extract the world model from the clients and encapsulate into separate nodes. World models will be client dependent. No new additional functionality.

Phase 1: Grid-based core using costmap_2d.

[ ] Stub out the plugins, core, and client:
- [ ] Map-to-Core
- [ ] Sensing-to-Core
- [ ] (Global) Planner plugin
- [ ] (Local Planner) Controller plugin
- [ ] Motion primitives plugin
- [ ] Core and layers
- [ ] Client
[ ] Define the core using a costmap_2d.
[ ] Sensing-to-Core plugin to use laser scan and update obstacle layer in costmap_2d.
[ ] Map-to-Core to use map to fill static layer in costmap_2d.
[ ] Global planner plugin to get master layer of costmap_2d used by navfn_planner.
- [ ] Client to support collision checking and providing cell cost.
- [ ] Update navfn_planner to use new interface.
[ ] Controller pluggin to get master layer.
- [ ] Client to support checking if trajectory is navigable.
- [ ] Update dwb to use new interface.

On a high level I don't see anything here that I see as a show stopper, but it also seems like this design is trying to bite out alot of things to be generally and extendable to everybody. I might think about reducing scope to get something out there that works and is relatable to existing technologies to build off of.

Only bit of input I'd add is that in the "core representation" I believe that there is no single correct way of representing the space (costmap, grid map, traversability map, etc) and should instead be a combination of all of them - such that the costmap is really a vector of of these items stacked on top of each other. Different applications can utilize different representations as they like but the data filled in populates the costmap, and the traversability map, and the ... based on whatever the application warrants.

From talking about this again this week I had another thought which I'm sure others have thought about but maybe not in words:

We would like to not have to redo different planning and controlling algorithms into different ones for different representations of costmaps. Rather than trying to make something like templating everywhere and trying to make everyone happy and readable, I'm thinking a good thought would be to have adaptors.

For example right now we're going costmap -> planner. Now we might go costmap -> adaptor -> planner, similarly with traversability map -> adaptor -> planner. This lets us generalize all the new planning and control algorithms independent of the implementation of the world model (making those by itself is hard without having to deal with lots of templates and thinking about ramifications across multiple representations).

Then we have a finite set of adaptors we need to build to represent the different models we'd like (costmaps, traversability, elevation, etc) who's job is to take the value of the neighboring cells and apply the vehicle kinematics and dynamics to give back a response "ok" "not ok" "unknown" or other options. The example adaptor for a costmap would be to change the 0-255 to those return types. Elevation map would apply the vehicle dynamics or maximum gradients to return the same type of information.

Now all planners or controller that work for 1 will work for all as long as they can run with "ok" "not ok" "unknown" or other options and be extendable easily for sampling based planners by creating methods in the adaptor to get the value at a certain location in global frame coordinates which the adaptor could project into a costmap cell, voxel grid pose, or elevation gradient.

@SteveMacenski

adaptors ... who's job is to take the value of the neighboring cells and apply the vehicle kinematics and dynamics to give back a response "ok" "not ok" "unknown"

Are you envisioning the output of the adaptor in this example as a sort of simplified costmap where each cell is filled with a ok, not ok, or unknown value. Or are you thinking more that the client feeds a trajectory to the adaptor and the adaptor returns the evaluation of the trajectory as a ok, not ok, unknown value?

This pushes the knowledge of the vehicle dynamics to the world model (adaptors) instead of the algorithms. How confident are you that this is the right place for that knowledge?

I had thought of having input and output adaptors as well, but I was thinking it could result N^2 adaptors since we theoretically could want to convert from any representation to any other. I expect it wouldn't be so bad in practice since many conversions are probably not useful.

Based on your feedback, I was imagining something like this below. We have a collection of representations. Adaptors to convert incoming data to those representations as needed. Clients that either get data directly from the representation they need or from an adaptor if there is a mismatch in the representation provided and the type they need.

Thanks for making that diagram, I'm certainly a pretty lazy guy when it comes to visualizations :)

First off, I'm not sure why you have a separate pipeline for laser scans, that defeats the purpose of having a general purpose world model to take in arbitrary sensors to generate a view of the world. I'd recommend completely scrapping that, I don't see anything special about a laser scanner requiring that type of pipeline within the world model. If you'd like to use it as a safety sensor with zones, that should be up stream of this since that's not generalized and many robots today don't use them anymore or have a different sensor suite.

What I think should be happening is a similar way of how it's done today. There's a set of optional plugins to buffer arbitrary sensor information (scans, images, depth maps, radar, sonar, etc) when does stuff of interest for that sensor, and inserts it into the map. I'm not thinking greatly into how to generalize those plugins for different plugins. In practice, I dont think you'll find a way to generalize that. A costmap is binning a depth map for collision avoidance, while an elevation map will use them over time to generate an ellipsoid curve or something are very different operations on the same sensor data. Those plugins for buffering and inserting data into their representation are probably representation specific.

Looking on the other side however, we have our [representation A] which needs to be utilized by the local/global planners to navigate. In the case of the elevation map, the Z coordinate becomes meaningful so we can't just talk about 2D X-Y coordinates anymore. The planner or controller will say "give me the neighbors" and its then up to the adaptor to say "I'm an elevation map, therefore my neighbors are in a 3D 8-directional curves" or "I'm a costmap, I just need to give the cells on either side of me", and return that information to the planner or controller to do their will with an ask for more things as needed.

This lets us do 2D or 3D representations but allow the same algorithms for local and global planning to operate (and moreover, would work for drones as well). When the adaptors return their neighbors, its up to the adaptor's knowledge of the robot dynamics and mechanics to assign them some set of finite states that can be generalized across all representations. That might include OK, not OK, unknown, not recommended, etc but as long as all of the algorithms are built to work with those same finite states, then whatever the representation is, it'll work. And the adaptors can use the Robot class to get those relevant dynamics. For a diff drive robot, its just all OK like in navigation1, but for an ackermann car or an elevation map on a legged robot, that may not be valid

I think what you are refering to as adaptor is what I have as a plugin below. We can later discuss the best pattern but the idea is the same, this block's goal is to translate the information available on the Core Representation and express it in something the client needs.

Yes that's what I'm thinking, with the only exception that the "Sensing and Perception" isn't 1 plugin, but also a series of plugins, but I think that may have just been represented that way for brevity.

Each representation will have an associated plugin that converts its values into the limited-enum types and is responsible for answering the query from the controller/planner "give me the neighbors" and "give me the value at (x,y,z)".

I'm not sure why you have a separate pipeline for laser scans

It was meant to represent a possibility. If a sensor could output data directly in representation format, it could talk directly to the model. But that's a stupid idea in context. I was playing with the idea that everything could be a ROS node; each representation could be a node and each adaptor as well.

I think what you are refering to as adaptor is what I have as a plugin below.

Do we want to be able to chain adaptors/plugins? As in, there is a plugin that provides data from the representation, but there is a second plugin that grabs the output of the first plugin and provides it in a different way

Each representation will have an associated plugin that converts its values into the limited-enum types and is responsible for answering the query from the controller/planner "give me the neighbors" and "give me the value at (x,y,z)".

So we'd need to figure out the queries that can be used by many algorithms. We'd then end up with an output plugin per class of algorithm, where a class could be graph search algorithms like A, Dijkstra, D etc.

Well at the end of the day all the graph search algorithms are going to ask for neighbors, the sampling based planners will ask for the result at a certain position, I'm not totally certain what optimization based planners will ask for, but that's overkill for 2D navigation as far as I know. If not, we can find out what thematically they ask for.

I think plugin API implementing neighbors, position in global frame (then it should internally find the value of its representation i.e. costmap would look for X-Y cell index and then the location in the array), and random sample to start with. We can always extend it if this is the way you also think makes sense. It was just a suggestion

(More thoughts) Grid Maps exists and does a number of things. I haven't done too much work with it myself but from what I've been reading it looks like everything costmap_2d does, this does as well. We might want to consider that as an option to completely replace costmap 2d for a ground up rebuild. Link for reference: https://github.com/ANYbotics/grid_map

Closing the issue, it seems we're sticking with the current costmap design for now.

ros-navigation / navigation2

Design the World Model #565

Background

Inputs to World Model

Perception

Design Improvements

Maps & Map Server

Design Improvements

Open Questions

Outputs from World Model

(Global) Path Planning

Design Improvements

Open Questions

(Local Path Planning) Obstacle Avoidance and Control

Design Improvements

Open Questions

Motion Primitives & Recovery

Design Improvements

Design

Goal

Objectives:

Proposal

Core Representation

Open Questions

Planner Plugin

Open Questions

Control / Collision Avoidance Plugin

Open Questions

Map to Core Plugin

Open Questions

Sensing to Core Plugin

Open Questions

Performance

Concerns

Next Steps