Semantic labeling in maps

ros-navigation / navigation2

ROS 2 Navigation Framework and System

https://nav2.org/

Other

2.62k stars 1.31k forks source link

Semantic labeling in maps #1595

Open SteveMacenski opened 4 years ago

SteveMacenski commented 4 years ago

Create a basic demo, and perhaps some standards, around enabling semantic information in the map yaml to label places:

point and size,
pixels,
polygons
others?

With info like:

name string
class type
geometric information
other properties?

Then capabilities to load that information, and some capabilities enabled on top of it.

Ex:

navigate to a label (go to dishwasher)
avoid a label (stay out of conference room)
get distance to label (are you within 10ft of the front door?)
where are you (in the Steve’s cubicle)

As such I propose we discuss the following, in this order

[x] File formats for points, lines, and areas: YAML, XML, OSM-XML, or other
[x] What we want out of masks and how to embed that information: pgm/png, XML, or other
[ ] A couple potential designs for multi-floor mapping to come to the consensus on how it makes most sense to represent the multiple maps and the "gateways" between them. The goal of this will be only to determine if the multi-map work is a consumer of this semantic work, or a direct member of it needing to be fully designed in tandem.
[ ] A couple of potential designs around using the major classes of objects: point, area, mask (e.g. dock or waypoint or elevator; zoom or section; speed or keepout) to motivate the design of tools or servers required for this development.
[ ] Tools: server to read like map server, services to get all or some of the semantic information, wrappers for calling to get it and parsing the result for the specific objects it wants, IO tools.
[ ] Once we have some pretty good understandings of how we want the data formatted and how we want them to be used, we can then discuss the GUI element of this. Though I think we're all in agreement here.

shrijitsingh99 commented 4 years ago

Following is our conversation on Slack regarding this issue:

@shrijitsingh99 We have populated our thoughts as well as a rough roadmap in the document attached below. Would be great if we could have some discussions and feedbacks on it. Semantic Maps.pdf

@SteveMacenski Can you give me some idea on what you're looking for out of semantic maps? I think a good place to start is with a list of things we might want to label or include in our maps to help drive the design. Things like keep outs, speed zones, etc are great and relatively straight forward since they're 1:1 mappings of labels, but I'm also thinking about things like marking doors, elevators, docks, certain common destination points, etc

My initial thought on that would be to add a new map_name_semantics.yaml file to the existing map.yaml file (like how we embed the pgms) which could contain the labels / types / poses of those objects. I think an interesting question is how to represent not just positions but more general areas like naming a room or "in front of a XYZ". Does the OSM format help with this at all?

I love your idea on making the map editors so that its more user friendly. I think either a Qt or rviz based solution is best, but I'll leave it to you guys to discuss. I'm open to either (maybe Qt is best?). Any thought into how we can use this information in the stack? I think that's an open question with potentially multiple reasonable solutions.

@sarath18 Here are a few ideas we have regarding the question you had.

Semantic Maps Discussion.pdf

@shrijitsingh99

Does the OSM format help with this at all?

Yeah it is used for representing such information. So whatever is data is currently supported in Google Maps, it can be represented using OSM file format.

I love your idea on making the map editors so that its more user friendly. Thats one of the main aims, to have it as easy to use as possible so that more people use it. I think either a Qt or rviz based solution is best, but I’ll leave it to you guys to discuss. More research & discussion is needed before we decided on which to use. Any thought into how we can use this information in the stack? As you said this would require more discussion and suggestions from everyone. We have some ideas in mind, will follow up on it soon.

@sarath18 Here's the link to the Layered Semantic Maps in Google Maps in case the link in the pdf is not working.

@SteveMacenski The nice thing about yaml is that someone can modify it relatively easily if they want to add something and not use this particular gui tool (e.g. build their own tool or have something automated to add points). OSM format appears to just be XML (https://wiki.openstreetmap.org/wiki/OSM_XML) which could also be fine. We just use yaml for other things so it makes sense to try to keep things consistent across the toolset. I wouldn't want to have some things in XML, others in YAML, and others in OSM. I want to keep the types of formats that effects maps down to just 1. After thinking a little, I think Qt is the best option. Rviz requires ROS to be running and having a Qt application can be done anywhere. Plus, if we make a setup assistant, that will likely be pyQi as well, so this could make it easier to integrate those ideas down the line into a single interface. Alexey has not joined the slack yet, I am encouraging him to do so.

I think before starting this work we should have a couple of concrete ideas in mind about how we use (what I see as) three classes of annotation: points, areas, and masks (points meaning a dock or elevator, a discrete thing; areas meaning a room or a zone; and a mask meaning keepout / speed masks over the map)

@shrijitsingh99 Agreed once we have a solid plan then only we should look into implementation. This require extensive discussion to cover all use cases.

@sarath18 I agree with you on that a unified format will be much easier than have parts in YAML and XML. But these maps can scale really quickly and handling it using YAML might get little trickly. On the other hand OSM based (XML type format) can provide much better readability and easier parsing. More extensive discussion on the pros and cons of these formats should be done before coming to a conclusion.

Here's a rough example of what the format would look like. https://gist.github.com/shrijitsingh99/8f24584edb11cda02227dbe21b0cb334

example

SteveMacenski commented 4 years ago

@AlexeyMerzlyakov I think you should join in on this (I've also sent you some Riot messages that it doesn't look like you've seen on this).

They're looking to collaborate to work on this stuff together and they have some really good ideas to add semantic information like the costmap filters and more into the navigation stack for use.

SteveMacenski commented 4 years ago

The map XML looks pretty good. I'm mostly just wanting to keep things consistent - if we use an XML for this, we should consider moving the map.yaml file to an xml as well. I don't want multiple formats floating around (especially multiple formats within the same section of the stack).

This format doesn't look like it can handle masks - how would you propose having masks as well for certain zones like a speed zone?

XML, YAML, etc, that's a decision to be made but do-able either way. Same with how we represent masks. The largest open-question from this proposal is how to use this information. I'm hoping that @AlexeyMerzlyakov and I can help with figuring that out. For things like keep outs / speed there's a more clear way of how this is handled. Its less obvious for things like docks or rooms.

A example would be to host a latched map metadata topic like the /map but with this information. A subscriber can get it and pick out the information it wants (costmap get keep out polygons, an autonomy stack takes the waypoints to follow, a planner gets the rooms for commands to go somewhere, etc). We should at least be able to make a demo using the large classes of things (docks, rooms, zones, waypoints).

fmrico commented 4 years ago

Another idea could be using the costmap for encoding the semantic info. We could use a .pgm and a .yaml to associate each value in the map to a semantic meaning.

shrijitsingh99 commented 4 years ago

The map XML looks pretty good. I'm mostly just wanting to keep things consistent - if we use an XML for this, we should consider moving the map.yaml file to an xml as well. I don't want multiple formats floating around (especially multiple formats within the same section of the stack).

Yeah, I too support only one format for maps, whether it is XML, YAML, or something else. I will make a list of pros and cons for both and we could discuss more on that then.

Another idea could be using the costmap for encoding the semantic info. We could use a .pgm and a .yaml to associate each value in the map to semantic meaning.

There are lots of limitations to this. One simple roadblock you will hit is that, what if each cell has multiple semantic information? There are other issues as well, such as, you will not be able to query information like "In which room is the fridge?".

This format doesn't look like it can handle masks - how would you propose having masks as well for certain zones like a speed zone?

A mask would be like any other polygon like a room. The difference would bet that it would contain an attribute or property tag specifying it is a mask and the mask name as well as other mask properties.

The largest open-question from this proposal is how to use this information. A example would be to host a latched map metadata topic like the /map but with this information. A subscriber can get it and pick out the information it wants (costmap get keep out polygons, an autonomy stack takes the waypoints to follow, a planner gets the rooms for commands to go somewhere, etc). We should at least be able to make a demo using the large classes of things (docks, rooms, zones, waypoints).

Yeah, so we also had thought of building somewhat of a query server where different entities can request certain data from the map and get the appropriate response. The examples you gave were pretty much what we had in mind.

Having such a query server would essentially cater to the needs of all the different types of applications, whether its cost maps, normal goal-based planners, something like a topological planner, and can even be used to have predefined paths also.

We were looking into data structures like R-Tree which would enable response to such queries.

We will try get a more concrete specification format proposal over the weekend.

SteveMacenski commented 4 years ago

A mask would be like any other polygon like a room. The difference would bet that it would contain an attribute or property tag specifying it is a mask and the mask name as well as other mask properties.

But a mask isn't a polygon. The situation I'm thinking of is if you want to have a speed zone mask. The value of the pixels are either an absolute speed (1.4m/s) or a percentage (43%) and you have gradients over spaces to ease into and out of restrictions. Or having slower speeds closer to static obstacles and faster elsewhere, like an inflation layer applied to speed. In this case, there's no clear polygon because its a gradient, unless you'd represent each individual pixel as a point. At that point, you'd really just be describing an image and it would make sense to just store it as an image for both compression and visual inspection (and modification).

Yeah, so we also had thought of building somewhat of a query server where different entities can request certain data from the map and get the appropriate response. The examples you gave were pretty much what we had in mind.

I think having some concrete examples of how we can integrate this capability into a behavior tree, planner, or where ever we decide is best should be a top priority to figure out. I don't think it necessarily effects the storage, loading/saving, GUI, or map server changes-- but it does effect the usefulness of adding such capability. We're on the same page on the mechanics of "how" (maybe not the exact details yet) to have the information, now's the time for "why".

We were looking into data structures like R-Tree which would enable response to such queries.

That's certainly one direction. If the robot is at point A and you request going to point B, there's no promise or implied relationship that A and B are anywhere near each other topologically. I'm not sure that type of data structure would be particularly well suited for this application unless we were querying in local neighborhoods. We could also just format this like a TF listener where we have some magic topic we use to broadcast this information, and we have an object that listens to it and can be used to get information as needed. I think a hash table would be fine and find via name fields. Another option. There exist many. It depends if we want to have a single XML-reader server (like map server) that broadcasts or each user of that information reads it itself.

SteveMacenski commented 4 years ago

Also reposting this that Carlos had posted in the topological navigation ticket: https://www.cpp.edu/~ftang/courses/CS521/notes/topological%20path%20planning.pdf

It brings up a good point if we want to structure this as a sparse graph and how we related annotated points/polygons to the map (associative or relational)

AlexeyMerzlyakov commented 4 years ago

The idea seems to be very interesting. It is closely related with zones [#1263] and lanes [#1522] tasks where currently I am working on. And also, it is close with milti-maps/multi-floors tasks as well. So, I think I ought to be involved in and we surely need to collaborate on this.

At first, I'd fully agree with Steve's opinion that we need to clearly identify the use-cases of the proposal. There are many big areas were already touched in the ticket, each of them having its own application. That is the step I think it is reasonable to dissuss what is required before we will move to system design disscussion. This is related to the question "Why?". For example, I see the following use-cases for in there:

Costmap Filters (zones and lanes)
Multi-floors/multi-maps case
Labeled points for navigation (e.g. we are saying to navigator: GoTo point "B", instead of: GoTo {10,20})

I think, we need to specify all of them definitely.

The next step - is to undestand what format could be used for storing and publishing semantic maps. From one hand, the OSM/XML or YAML is good for points, labels and objects, but not suitable for such items like zones, lanes and so on. From another hand, we can't do this task using only a costmaps (multiple objects per one point limitation). Possibly, it might be required to keep both formats in one map: PGM files for map(s) layers, zones/lanes, etc. and say OSM for preserving map metadata. Therefore it is important to undestand what we finally want.

The open question here - it is a map publishing. I we have many-layered maps and OSM, we need to think over ROS2 topic relation. Is it possbily to have one OSM master topic and dynamically configured /map_layer_i topics?

Regarding R-Tree queries, I think this is rather big task related to both map-server, BT-tree navigator and topological path planner involvement. That in my opinion it is better to be moved to a separate ticket. @SteveMacenski what do you think about this point?

About QT/RViz/Web-based application: my opinion that QT or RViz are possibly best solutions. Especially QT-based map redactor GUI. This will allow developers to draw theit own maps without running ROS2 infrastructure (AFAIK, ROS2 already require QT libraries to be installed onboard, so there is no problem about QT). Also, I have some experience in window drawing on QT, so I can also help here with.

Unfortunately, this week I have a limited access to the Internet. Starting from next Monday I will be able to discuss it more detaily.

shrijitsingh99 commented 4 years ago

This is what I had in mind for the map/query server. So that map server would be reading and parsing the XML then storing it in a suitable data-structure (which can be discussed at a later stage). The map server then offers a set of endpoints that can be accessed by using ROS service for a query where the client can request some data such as "location of docking station", "location of waypoint B" or "current room robot is in". This service can also be used to alter the states of various regions such as disabling a zone, marking room as closed temporarily, marking docking station as occupied, etc. This dynamic nature will allow modifying the map at run time. This will allow huge growth potential in the future, one future feature could be defining new zones or waypoints at runtime.

Also reposting this that Carlos had posted in the topological navigation ticket: https://www.cpp.edu/~ftang/courses/CS521/notes/topological%20path%20planning.pdf

It brings up a good point if we want to structure this as a sparse graph and how we related annotated points/polygons to the map (associative or relational)

I have had a quick glance over this, will look into it in more detail. Both methods associative or relational will need more detailed analysis and discussion to decide what would be a better approach.

We're on the same page on the mechanics of "how" (maybe not the exact details yet) to have the information, now's the time for "why".

Yeah, I have listed a few points above but will try and formalize everything into a document so that we can keep track of all the use cases. What do you think? This should help streamline discussions and ideas.

We could also just format this like a TF listener where we have some magic topic we use to broadcast this information, and we have an object that listens to it and can be used to get information as needed. I think a hash table would be fine and find via name fields. Another option. There exist many.

We can even consider something like a database (though it might be overkill) for this. TBH I think even brute force search will also be sufficiently fast since the maps we build will be pretty small in size compared to OSM who are storing the map of the entire world, so it makes sense to optimize search time.

It depends if we want to have a single XML-reader server (like map server) that broadcasts or each user of that information reads it itself.

I am more inclined to a single server since individual queries will be pretty small so each user reading the information itself will add unnecessary overhead.

It is closely related with zones [#1263] and lanes [#1522] tasks where currently I am working on. And also, it is close with milti-maps/multi-floors tasks as well. So, I think I ought to be involved in and we surely need to collaborate on this.

Definitely, collaboration is certainly needed.

The open question here - it is a map publishing. I we have many-layered maps and OSM, we need to think over ROS2 topic relation. Is it possbily to have one OSM master topic and dynamically configured /map_layer_i topics?

This seems like a good idea having a sperate topic for each zone but maybe something like having a map server that will dynamically generate the map layer and return will be more generalized if we are creating a quer-response type system that I mentioned at the beginning of this post.

About QT/RViz/Web-based application: my opinion that QT or RViz are possibly best solutions.

QT seems like a general consensus.

We can also look into Lanelet2, it is widely used in HD Maps and is built on top of the OSM specification.

Sarath18 commented 4 years ago

Possibly, it might be required to keep both formats in one map:

I agree with you on the point that we might need both types of formats in one map to avoid being limited to one object per point. For multistoried maps, all the data in one level can be encapsulated along with the reference to the .pgm file that describes the map of that particular level.

In this case, there's no clear polygon because its a gradient, unless you'd represent each individual pixel as a point.

Speaking about masks, having gradient masks for zones seem like a really good idea. I believe adding masks to zones in XML files will be a better option and will have the following advantages over the masks defined through pgm files:

Gradients for the masks will be generated during runtime using the gradient functions like linear, radial, etc. defined as attributes to the mask property of a zone.
Multiple masks can be associated with the same zone without defining the geometry for the zone again and again. On the other hand, we need to create separate pgm file to define each type of mask.
pgm file needs to be edited every time whenever we need to tweak the gradient mask.
These gradient generators can be reconfigured dynamically during runtime which is not possible when loading static gradient maps in pgm files.

For example, gradient masks defining directions will work for hand in hand with the DirectedLanes mentioned in #1263. The preferred direction can be changed on some parameter like time of the day, emergency, crowded places etc.

<zone id="corridor">
  <!-- Node references defining a zone -->

  <!-- Zone Masks-->
  <mask type="speed" gradient="radial" percentage="0.6"/>
  <mask type="direction" gradient="linear"/>
</zone>

About QT/RViz/Web-based application: my opinion that QT or RViz are possibly best solutions.

From all the above discussion, I think we all agree on the fact that Qt is to the best option for the developing the semantic maps editor. These are a few resources/examples will help in the development of the editor.

SteveMacenski commented 4 years ago

I think we need to potentially back up and look at solving one problem at a time, there's too many sub-topics being discussed.

I think we're all on the same page about the annotations for points, areas, and lines (rooms, docks, elevator locations, etc). We have some disagreement on the masks, there's some talk of multi-floor mapping, and the topic of how do we use this information. Lets go one step at a time and build on each other. It makes it easier than these long comments hitting on a bunch of different topics.

As such I propose we discuss the following, in this order

[x] File formats for points, lines, and areas: YAML, XML, OSM-XML, or other
[ ] What we want out of masks and how to embed that information: pgm/png, XML, or other
[ ] A couple potential designs for multi-floor mapping to come to the consensus on how it makes most sense to represent the multiple maps and the "gateways" between them. The goal of this will be only to determine if the multi-map work is a consumer of this semantic work, or a direct member of it needing to be fully designed in tandem.
[ ] A couple of potential designs around using the major classes of objects: point, area, mask (e.g. dock or waypoint or elevator; zoom or section; speed or keepout) to motivate the design of tools or servers required for this development.
[ ] Once we have some pretty good understandings of how we want the data formatted and how we want them to be used, we can then discuss the GUI element of this. Though I think we're all in agreement here.

Lets start simple on bullet 1: how do we want to embed information for the lines, points, and areas.

@shrijitsingh99 and @Sarath18 have proposed OSM-XML. Can you give us a brief overview about why you like this format over other XMLs or other open standards? This seems to relate more to the autonomous car space, so I would just like you guys to, for the record, give us the reasons that you came to that decision. Also, does OSM let us create our own object attributes (e.g. can I add a new field whatever-thing as a XML field for a point? I'm wondering if this standard restricts us from expanding later potentially.

The other alternative is YAML which the existing maps and ROS configuration files are in. YAML is slower to load than XML due to its more complex structure, but that's not overly concerning to me. I've worked with YAML with tens of thousands of annotations on maps without too much of a worry. The current map structure looks like

map.yaml

image: turtlebot3_world.pgm
resolution: 0.050000
origin: [-10.000000, -10.000000, 0.000000]
negate: 0
occupied_thresh: 0.65
free_thresh: 0.196

I assume these would be trivial to move to OSM if we wanted it to be. That would, however, break backwards compatibility completely with existing maps and all ROS1 users.

No matter which format we decide, I think that there should be a new field named annotations, labels, or similar as an entry in the map metadata file (like turtlebot3_world.pgm) and not placed directly into the map metadata file. This is to allow for multiple annotations for the same map to be used (and also backwards compatibility).

shrijitsingh99 commented 4 years ago

@shrijitsingh99 and @Sarath18 have proposed OSM-XML. Can you give us a brief overview about why you like this format over other XMLs or other open standards?

The OSM specification is a very mature and widely used standard, it is primarily used by OSM as well as liblanelet. Our use case will mill most likely be a subset of features that OSM offers with some minor additions. It already is used for defining geometries, routing, multi-level maps.

Each element defines tags which have a key and value:

<tag key="name" value="abc" />
<tag key="description" value="xyz" />

So properties or attributes can be added using this.

Also, does OSM let us create our own object attributes (e.g. can I add a new field whatever-thing as a XML field for a point? I'm wondering if this standard restricts us from expanding later potentially.

It is designed to be generalizable and highly scalable. New features can be very easily added without modifying the parser or the GUI tool at all. Since it has doesn't define extra xml tags for new functionality like masks. It uses something called relation for adding to properties to geometric features (i.e. points, lines, polygons etc.).

The example @Sarath18 gave:

<zone id="corridor">
  <!-- Node references defining a zone -->

  <!-- Zone Masks-->
  <mask type="speed" gradient="radial" percentage="0.6"/>
  <mask type="direction" gradient="linear"/>
</zone>

in OSM would look like:

<relation id="56688" user="shrijit" uid="12345" visible="true" version="28" changeset="6947637" timestamp="2011-01-12T14:23:49Z">
  <member type="node" ref="294942404" role=""/>
  ...

  <tag k="name" v="Meeting Room No Go Zone"/>
</relation>

<relation id="56689" user="shrijit" uid="12345" visible="true" version="28" changeset="6947537" timestamp="2011-01-12T14:23:49Z">
  <member type="relation" ref="56688" role=""/>
  <tag k="type" v="mask:speed"/>
  <tag k="gradient" v="radial"/>
  <tag k="percentage" v="60%"/>
</relation>

<relation id="56689" user="shrijit" uid="12345" visible="true" version="28" changeset="6947537" timestamp="2011-01-12T14:23:49Z">
  <member type="relation" ref="56688" role=""/>
  <tag k="type" v="mask:direction"/>
  <tag k="gradient" v="linear"/>
</relation>
</osm>

So adding something like route at a later stage can be done using a relation by <tag k="type" v="route"/>, this will no require modification in the GUI too nor the parser, you just have to add logic to process this new data.

OSM currently cannot be directly used for our purpose and would require a minor modification to work in the local cartesian system. It currently works in the geographic coordinate system. Still, even after this modification, we can leverage existing OSM tools like GUI tools, parsers, inter-format conversion tools, etc. without minor modifications to the code.

OSM being a specification supports a multitude of formats, OSM-XML being one of them. So if needed we can store data in multiple formats.

The simplicity of the OSM format having only few tags is one major factor that appeals to me. Choosing the OSM spec will allow us to focus on 'why what and how' without being bogged down by discussion on the format every time we add a new feature.

Whether to use YAML, XML or JSON can be discussed but it doesn't really affect the specification. According to me, XML has an advantage of being able to define tag-attributes but makes everything look clunky. YAML: Looks very clean and is already used in ROS but messed up indentation can become a pain to solve.

There are already several articles online comparing the different formats in detail so it won't go into that in detail.

That would, however, break backwards compatibility completely with existing maps and all ROS1 > users.

A break might be inevitable since the current format is only for single floor. When we add multi-floor we will have to add multiple pgm files to the YAML file.

This is to allow for multiple annotations for the same map to be used

I don't really a scenario for this to happen. I am not currently a big fan of having multiple files for maps. map.yaml feels more of a parameter file for map_server and not really related to the map actually so we should treat like another configuration file for a node.

AlexeyMerzlyakov commented 4 years ago

A break might be inevitable since the current format is only for single floor. When we add multi-floor we will have to add multiple pgm files to the YAML file.

The backward compatibility might be provided by introducing a new tags in YAML, e.g. called map. If there is no such tag, resulting map.yaml to be treated old-map compatible, if exists - we may use new multi-floor/multi-map model. For example, the map.yaml for 2-floors configuration might look like:

map: FL1
  image: world_A.pgm
  labels: world_A.osm
  resolution: 0.050000
  origin: [-10.000000, -10.000000, 0.000000]
  negate: 0
  occupied_thresh: 0.65
  free_thresh: 0.196
map: FL2
  image: world_B.pgm
  labels: world_B.osm
  resolution: 0.150000
  origin: [-15.000000, 20.000000, -5.000000]
  negate: 0
  occupied_thresh: 0.7
  free_thresh: 0.1

This map also could have a reference to OSM-files (if exist) containing labels. The main shortcoming of this approach will be producing many types paradigm: PGM+OSM+YAML. Which I think we need to avoid.

As an alternative, map server API could support both YAML or OSM files metadata. YAML will be remained for backward compatibility with ROS1 and current configuration, OSM will be used for newer types of (multi)maps. I think, there is no problem to convert dynamically YAM-compatible format into OSM and vice versa.

If we will choose OSM metadata type, it is not clear how could we subscribe/publish OSM using a ROS2 topic?

SteveMacenski commented 4 years ago

OSM currently cannot be directly used for our purpose and would require a minor modification to work in the local cartesian system. It currently works in the geographic coordinate system.

How do we overcome that?

Still, even after this modification, we can leverage existing OSM tools like GUI tools, parsers, inter-format conversion tools, etc. without minor modifications to the code.

I assume you mean with minor modification, so we're at least working in cartesian coordinates (and probably some buttons like drop dock / waypoints / etc that are nav specific)

OSM recommends using PBF rather than OSM-XML. Why not that? By the time we're changing formats and if we're going to add a GUI and such, why not make it high performance?

A break might be inevitable since the current format is only for single floor. When we add multi-floor we will have to add multiple pgm files to the YAML file.

That's not correct. If we continue with yaml, we would only need to add more fields for different floor map locations, not removing any information. This is 100% backwards compatible for the case of a single floor. We must provide conversion scripts if we go this route, but I think that messing with this specification could be very detrimental for getting users to move to Nav2 if their entire library of maps are in an incompatible format. We may need to support both, even.

I am not currently a big fan of having multiple files for maps. map.yaml feels more of a parameter file for map_server and not really related to the map actually so we should treat like another configuration file for a node.

From real-world experience that this separation is important. There should be a central file for telling it where to find other stuff if necessary. While I wasn't around in the Willow days when the map.yaml was created, my guess is its not because they couldn't embed that information into the header of the pgm. The map.yaml is not a configuration for the map_server because its associating data with hyperparameters required to interpret that data.

@AlexeyMerzlyakov suggestion looks as reasonable as anything else. I may have preferred a mapping of {floor_id: /path/to/osm} but we don't need to get bogged down in details on that. I mirror his thoughts on backwards compatibility of files and/or conversions and/or backwards support of yaml.

I think per @shrijitsingh99's comment, there is some benefit drawn from OSM (or at least conversions to and from it) in use of GUI tools for annotation, if its possible to modify them to be useful for our needs. I would be OK with using OSM for the labels / annotations if we can show that we can embed custom fields on points and the necessary annotations we may need for navigation. A few examples

Point that has type of "Dock" with a name of "Dock1" with coordinates (X, Y, Theta) and priority value 0.52
Point of type Waypoint with name "waypoint 71" with coordinates (X, Y, Theta) and a tolerance of 0.42
Point of type Viapoint with name "Front Door" with coordinates in GPS with tolerance of 0.42 and action type "take photo".
An area with label living room and a vector list of items in the room (e.g. TV, couch, etc)
A region labelled no go zone with name "Chemical Spill"
How to support multiple levels. I imagine we have a base XML with just includes to the other XMLs with a name for "Floor 1", "Floor 3". etc.
Global settings of the map like the path (relative or absolute) to the map image relative/path/to/map.png, time created, location, map frame origin, number of fields in the OSM file, etc.

Is there an example GUI that you think we could use as a starting point to make some GUI editor program if we use this format that can support custom fields like above? I imagine in a simple case we have the map in a Qt window with a side bar of buttons like "drop dock" to drag and drop or set exact coordinates, then opens a menu to input other metadata about that object. Or for the areas drag a box or make line segments to create a shape. It would be create for basic primitives like dock, waypoint, door, elevator, etc there were just direct buttons with pre-configured metadata profiles.

If so, then we'd just need to backwards support .yaml files in the code I suppose and probably also provide a conversion script (which would be really simple with only 5-6 entries). My hesitations are making sure we don't have YAML + OSM + PNG like Alexey says, making sure there's a way for existing yaml users to use Nav2 through native support or conversions, making sure that the standard can cleanly support the types of data we require (see above), and that there's some tooling in the ecosystem for that format that makes it valuable to use for our needs. If we satisfy those, I'm OK with using OSM for the points / regions / connections annotations.

shrijitsingh99 commented 4 years ago

The backward compatibility might be provided by introducing a new tags in YAML, e.g. called map.

Sounds good

As an alternative, map server API could support both YAML or OSM files metadata.

Yeah, this will be a good approach. Converting between either types will be pretty straightforward.

If we will choose OSM metadata type, it is not clear how could we subscribe/publish OSM using a ROS2 topic?

So this requires further discussion. I had mentioned some stuff here:

This is what I had in mind for the map/query server. So that map server would be reading and parsing the XML then storing it in a suitable data-structure (which can be discussed at a later stage). The map server then offers a set of endpoints that can be accessed by using ROS service for a query where the client can request some data such as "location of docking station", "location of waypoint B" or "current room robot is in". This service can also be used to alter the states of various regions such as disabling a zone, marking room as closed temporarily, marking docking station as occupied, etc. This dynamic nature will allow modifying the map at run time. This will allow huge growth potential in the future, one future feature could be defining new zones or waypoints at runtime.

So building on this, for zone maps we can either internally in the map_server convert to occupancy grids and publish or just publish out the polygon and gradient information in a custom message type.

Something on these lines, would like to here opinions regarding this mechanism. I can expand on the above as I might not have been very clear on this.

How do we overcome that? Haven't gone deep into this, but from the surface, since it is just a specification we are free to modify it and and stuff to it as we will be interpreting the format to whatever we find suitable.

My main point was to use the OSM specification as a base to build our own spec, not to exactly use OSM.

OSM recommends using PBF rather than OSM-XML. Why not that? By the time we're changing formats and if we're going to add a GUI and such, why not make it high performance?

Yeah, it supports multiple formats, we can choose whatever we think fit. I was stressing building off the specification, the file format can be kept to anything be it PBF, XML, YAML they are all inter convertible.

There should be a central file for telling it where to find other stuff if necessary. That makes sense since.

Point that has type of "Dock" with a name of "Dock1" with coordinates (X, Y, Theta) and priority value 0.52

Point of type Waypoint with name "waypoint 71" with coordinates (X, Y, Theta) and a tolerance of 0.42

Point of type Viapoint with name "Front Door" with coordinates in GPS with tolerance of 0.42 and action type "take photo".

An area with label living room and a vector list of items in the room (e.g. TV, couch, etc)

A region labelled no go zone with name "Chemical Spill"

You can represent the above stuff for sure.

How to support multiple levels. I imagine we have a base XML with just includes to the other XMLs with a name for "Floor 1", "Floor 3". etc.

They do have support for multi-levels, you can see it on their maps, but haven't looked into this in detail. Will look it up.

Global settings of the map like the path (relative or absolute) to the map image relative/path/to/map.png, time created, location, map frame origin, number of fields in the OSM file, etc. Can be added with custom tags, not sure if there in the current spec.

I think there are 2 discussion going on and getting mixed up. From my perspective the specification of how you represent the semantic data (like the OSM specification, what tags to have etc.) is separate from which file format we use, because you can use any file format given a spec.

I might have been unclear about this so stressing on it again: I am not saying we use the OSM directly but use it as a base incorporating its core features (how it represents lines, polygons, properties etc.) to build own specification.

SteveMacenski commented 4 years ago

My main point was to use the OSM specification as a base to build our own spec, not to exactly use OSM.

In that case, aren't we just talking about XML then? If we build from it, then we probably won't have direct access to the annotation tools.

You can represent the above stuff for sure.

Can you provide snippet examples of these in the spec. I'm looking for direct validation that these can be supported.

I think there are 2 discussion going on and getting mixed up. From my perspective the specification of how you represent the semantic data (like the OSM specification, what tags to have etc.) is separate from which file format we use, because you can use any file format given a spec.

OSM is the metadata, the png / pgm / etc are the actual map images. What I was looking from that line is just the ability to have some global parameters of which one can be a filepath to the map image. I'm not asking that OSM knows or cares about what this is. We just need to be able to globally embed the same information as in the map.yaml in this map.osm file.

I might have been unclear about this so stressing on it again: I am not saying we use the OSM directly but use it as a base incorporating its core features (how it represents lines, polygons, properties etc.) to build own specification.

I think that's where you lose me a bit. If we're not using this spec to use the tools that it works with, why use it at all? I can see value in using an existing standard and also then reaping the benefits of tooling available. If we're going to change the standard, then we're really just talking about XML.

AlexeyMerzlyakov commented 4 years ago

Yeah, this will be a good approach. Converting between either types will be pretty straightforward.

Looks like now we are on the same page about backward comatibility. Great.

So this requires further discussion. I had mentioned some stuff here ...

I agree that having a server with pointy service queries - it is a good practice. However, this does not cover all cases. Let's imagine: I am writing a new path planner which counting all the features we are discussing here (dock stations, doors, room types, etc...). This planner wants to have all map information in the same place including the metadata in order to make its job. If path planner will start to sending to a map server a bunch of pointy requests per each object (location of docking station, location of the door, etc...) on each path planning iteration (requests should be iterative because of dynamism of the world as mentioned above), this may highly affect whole system performance.

Therefore we additionally may need to have an ability to share whole metadata dynamically in one place through a map server. New imaginary path planner might parse itself metadata-file and select necessary features from it, but I consider this is rather map server responsibility. In this case we can continuously sharing whole metadata through a ROS2 topic or having aggregating service requests. However, both msg and srv formats are rather restricted for a spiecific types of data. By adding new type of objects into metadata will require re-build of services and/or messages and will break compatibility with previous versions of messages or requests, which is not very suitable for a flexibility we are lookin on.

Another way - is to sharing a hash-table[] via /topic with integer hashes (for objects' keys) and its values. Just a brainstorm. Anyway, I think it is an open question for today.

Another open question - is that @SteveMacenski told about. Why do not prefer XML over OSM if OSM is not fully suitable for us and we need to adjust/modify OSM format along tools for OSM format for our needs. XML or even YAML looks more straightforward for that.

SteveMacenski commented 4 years ago

@AlexeyMerzlyakov please keep the conversation on topic, we're not at discussing how we use information. See the bulleted list up the thread. We'll be talking forever and never doing if we don't keep focused on one issue at a time. Right now we're discussing the format to save points, areas, and map metadata. (but we could easily have a service on map_server to get all metadata or metadata matching some regex by feature name/type/area.)

Why do not prefer XML over OSM if OSM is not fully suitable for us and we need to adjust/modify OSM format along tools for OSM format for our needs. XML or even YAML looks more straightforward for that.

This is the big question for me. OSM as a spec makes sense to use if we can use it and use tools from it, but if we're going to change it and not be able to use the tools, then its just XML. There's not a problem with that. I'm just trying to make sure we're making a decision with all the facts based on how we'll end up functionally using this. I really don't want to keep on this discussion for another week.

@shrijitsingh99 can you comment on if its just XML or if OSM means something here we need to consider? XML we can obviously replicate anything in the existing yaml for so I have no concerns. We'll just need a yaml (for backwards support) and an xml parsing library. If its just XML and @shrijitsingh99 this is the way we want to go, I approve.

Sarath18 commented 4 years ago

Hello everyone, I have talked to @shrijitsingh99 regarding the file format and what I believe he wants to convey is, we use the standards specified in these formats and the use them for creating semantic maps. By standards, we mean how semantic information is defined in these formats and the relationship among them. For example

OSM: nodes, ways/paths, zones, relations
GeoJSON: type, geometry, properties

and other related formats used in mapping applications.

Out of all these standards, we thought. We can use the maturity of all the standards and build on top of it to create our own semantic information. Among these, we thought OSM was the best to build upon. The following links:

define the basic elements and standards used to represent data and not the file format XML or PBF. Our implementation will boil down to using XML and YAML as file formats (which hopefully everyone at this point agrees on).

By using these standards we can leverage the power of already existing tools to and building on top of them to support our use case.

Now by using XML with YAML for semantic maps, I would like to summarize all the key features we have discussed:

Addition of semantic information to existing PGM/image files
Backward compatibility
The tools that will be used (GUI and parsers) are already present in the ecosystem i.e. YAML used map configs and XML in BehaviorTrees
Using these file formats will provide high readability and configuration capabilities. PBF file (compressed) files will have no readability.
We can convert the data storage into any format we want (JSON, PBF) and highly compressed file formats like PBF can be used to publish map metadata.

shrijitsingh99 commented 4 years ago

@shrijitsingh99 can you comment on if its just XML or if OSM means something here we need to consider? XML we can obviously replicate anything in the existing yaml for so I have no concerns. We'll just need a yaml (for backwards support) and an xml parsing library. If its just XML and @shrijitsingh99 this is the way we want to go, I approve.

I think we all are nearly on the same page. @Sarath18 summarized what I wanted to say more clearly.

So if by XML you mean creating a new standard which uses some of the concepts of the OSM spec (namely points, ways and relations) then I guess we are on the same page.

Coming to file format, do we want to use XML, YAML or something like compressed? I am fine with any but like @Sarath18 said if we use compressed there will be no readability.

Backward compatibility is 100% agreed upon, how we are going to make it backward compatible exactly needs one final discussion.

shrijitsingh99 commented 4 years ago

Point that has type of "Dock" with a name of "Dock1" with coordinates (X, Y, Theta) and priority value 0.52

<node id="2" x="1.0" z="1.0" z="0.0" yaw="1.57" >
    <tag k="type" v="Dock" />
    <tag k="name" v="Dock1" />
    <tag k="priority" v="0.52" />
</node>

Point of type Waypoint with name "waypoint 71" with coordinates (X, Y, Theta) and a tolerance of 0.42

<node id="8" x="5.0" z="6.0" z="0.0" yaw="1.57" >
    <tag k="type" v="Waypoint" />
    <tag k="name" v="waypoint 71" />
    <tag k="tolerance" v="0.42" />
</node>

Point of type Viapoint with name "Front Door" with coordinates in GPS with tolerance of 0.42 and action type "take photo".

This one is not very straighforward as you will need to define a GPS reference point in the map.

<reference id="10" lat="54.0889580" lon="12.2487570" x="0" y="0" />

<node id="9" lat="13.74534" lon="14.86546" >
    <tag k="type" v="Viapoint" />
    <tag k="name" v="Front Door" />
    <tag k="tolerance" v="0.42" />
</node>

An area with label living room and a vector list of items in the room (e.g. TV, couch, etc)

<node id="213" x="5.0" y="10.0" />
<node id="214" x="5.0" y="5.0" />
<node id="215" x="10.0" y="5.0" />
<node id="216" x="10.0" y="10.0" />

<node id="2" x="5.0" z="6.0" z="0.0" yaw="1.57" >
    <tag k="name" v="TV" />
</node>
<node id="3" x="7.0" z="6.0" z="0.0" yaw="1.57" >
    <tag k="name" v="Couch" />
</node>
<node id="4" x="8.0" z="6.0" z="0.0" yaw="1.57" >
    <tag k="name" v="Table" />
</node>

<way id="312">
    <nd ref="213" />
    <nd ref="214" />
    <nd ref="215" />
    <nd ref="216" />
    <nd ref="213" />
    <tag k="name" v="Living Room" />
</way>

You get the list of items in the room by querying all the items defined within the room bounding areas so no need for an explicit relation between contents of the room. Nonetheles if you do need an explicit relation between these two entities, you cant do it as below:

<relation id="416">
 <member type="Node" ref="2" />
 <member type="Node" ref="3" />
 <member type="Node" ref="4" />
 <tag k ="name" v="Living Room Contents" />
</relation>

A region labelled no go zone with name "Chemical Spill"

<node id="213" x="5.0" y="10.0" />
<node id="214" x="5.0" y="5.0" />
<node id="215" x="10.0" y="5.0" />
<node id="216" x="10.0" y="10.0" />

<way id="312">
    <nd ref="213" />
    <nd ref="214" />
    <nd ref="215" />
    <nd ref="216" />
</way>

<relation id="415">
    <member type="Way" ref="312" />
    <tag k="type" v="No Go Zone" />
    <tag k="name" v="Chemical Spill" />
</relation>

How to support multiple levels. I imagine we have a base XML with just includes to the other XMLs with a name for "Floor 1", "Floor 3". etc.

There is a way for this, but haven't looked at it. Will go through the docs and update this.

SteveMacenski commented 4 years ago

We can use the maturity of all the standards and build on top of it to create our own semantic information

So to be clear, when you say something like "We want to use OSM standard" that means that you choose the OSM standard and the application will comply with that standard. Ex. "I follow the ISO 26262 standard" doesn't mean that you pick the things you like and build off of it / change the specifications to fit your needs. I think what you mean to say is that you see some OSM standard as an example set of tags and structures to borrow to build either a new standard or a new format based on XML.

We should be clear about that language moving forward. Saying something like "We use the OSM format" would be incorrect and misleading, unless the specification allows us to work completely within it. If the spec allows you to create custom nodes according to some standards and we comply with those standards, then we could say that we are OSM compliant. Else, we are an XML format taking inspiration from OSM.

Let us know if we're taking inspiration from OSM or complying with OSM for the requirements let out above.

By using these standards we can leverage the power of already existing tools to and building on top of them to support our use case.

[Citation needed], Can you give us, specific, examples of tools that we can use from OSM in using this format with our own custom extensions? For me, that was the #1 reason to look at OSM was to use their tools, and its no longer clear to me if what we'd make is compliant with the standard to use them.

I don't care one way or another if they're PBF, XML, or YAML. I want want there to be an engineering reasoned rationale for the choice. I would prefer XML or YAML for human readability and being able to use general YAML / XML parsing tools for making new tools. But if there's a structured reason that PBF is the best, lets do it. It sounds like though we're swaying towards something human readable.

This one is not very straighforward as you will need to define a GPS reference point in the map.

Would you though? If your robot had a GPS fix at Lat1 Long1 and you have a wp in the XML as Lat2 Long2, do you need a reference?

<relation id="416">
 <member type="Node" ref="2" />
 <member type="Node" ref="3" />
 <member type="Node" ref="4" />
 <tag k ="name" v="Living Room Contents" />
</relation>

This I'm curious about - how does this work? So if you had 4 objects (node tags) why define the relationship? Can't you query whatever parses this for all nodes in an area set out by a way what you're using to describe rooms? Do you have to define all internal relationships explicitly? It's also not clear to me why the living room is a way and the chemical spill is a relation. They look to contain the same types of information.

To recap:

Looking for formalization on what the proposed XML-like structure is
Either way, a engineering rationalized argument for it (if OSM and example tools that are helpful, that's a good reason. If just XML then why XML over YAML)
Still on same page for having some GUI tools, whos scope will be discussed later
Supporting Yaml backend for backwards compatability (map server can load, map saver only does xml, and GUI should also be able to load yaml but export xml)

Once we have that settled, we can write that up in the design doc that we're using XYZ format for ABC reasons, with 123 examples for [I'm out of canonical sequences] types of geometries covering intentional scope of {dock, wp cartesian, wp gps coord, elevator, door, room, area, zone, lane (?), arbitrary object, keep naming important things we need to make sure are covered well}. Then move on to discussing the masks, which I think should be short and to the point.

lane(?): is the way in OSM supposed to be a lane? https://wiki.openstreetmap.org/wiki/Way this makes it seem like it should be a center graph or something. Is there a more accurate tag to use for closed space rather than a travering way? Seems like a way should be used to define routes and lanes.

shrijitsingh99 commented 4 years ago

We should be clear about that language moving forward.

Got it, will be more explicit going forward.

Let us know if we're taking inspiration from OSM or complying with OSM for the requirements let out above.

Inspiration

Complying won't be possible even if we wanted to since it requires lat and long for defining position.

[Citation needed], Can you give us, specific, examples of tools that we can use from OSM in using this format with our own custom extensions? For me, that was the #1 reason to look at OSM was to use their tools, and its no longer clear to me if what we'd make is compliant with the standard to use them.

Editors: https://wiki.openstreetmap.org/wiki/Comparison_of_editors JOSM (Java based), Merkaartor (Qt), iD (Web) being the commonly used ones.

Database Tools: https://wiki.openstreetmap.org/wiki/Databases_and_data_access_APIs

Format Conversion Tools: https://wiki.openstreetmap.org/wiki/Converting_map_data_between_formats

Multiple Supported File Formats: https://wiki.openstreetmap.org/wiki/OSM_file_formats

Not sure how useful this is we go the inspiration route, maybe by modifying some of the tools especially the GUI ones.

It sounds like though we're swaying towards something human readable.

Yeah, compressed makes more sense if the maps are larger like spanning km. Indoor maps are pretty small.

Would you though? If your robot had a GPS fix at Lat1 Long1 and you have a wp in the XML as Lat2 Long2, do you need a reference?

In that case, might have to come up with something for this, even adding such points in GUI wont be as simple as dragging and dropping a point since its not on the local XY coordinate system.

This I'm curious about - how does this work? So if you had 4 objects (node tags) why define the relationship? Can't you query whatever parses this for all nodes in an area set out by a way what you're using to describe rooms? Do you have to define all internal relationships explicitly?

You don't need to define explicitly:

You get the list of items in the room by querying all the items defined within the room bounding areas so no need for an explicit relation between contents of the room. Nonetheless if you do need an explicit relation between these two entities, you cant do it as below:

It's also not clear to me why the living room is a way and the chemical spill is a relation. They look to contain the same types of information.

The living room was just a polygon so I defined it using a way. You can do this using a relation if you defined a type called room, then a relation would make sense without that "living room" is just a name for a polygon.

Chemical Spill spill might have multiple properties like No Go Zone, Radiation Zone, etc associated with it so you might need to reuse the way for each of them.

Looking for formalization on what the proposed XML-like structure is

Either way, a engineering rationalized argument for it (if OSM and example tools that are helpful, that's a good reason. If just XML then why XML over YAML)

Still on same page for having some GUI tools, whos scope will be discussed later

Supporting Yaml backend for backwards compatability (map server can load, map saver only does xml, and GUI should also be able to load yaml but export xml)

👍 on all the points

SteveMacenski commented 4 years ago

Not sure how useful this is we go the inspiration route, maybe by modifying some of the tools especially the GUI ones.

Ok, that was what I was looking for, confirmation if we can use them, the answer is probably not but a good starting point to fork from.

In that case, might have to come up with something for this, even adding such points in GUI wont be as simple as dragging and dropping a point since its not on the local XY coordinate system.

I'm imagining some of these things being automatically added by other tools than a GUI. We support what we support in the GUI, but want to make sure that the typical navigation2 use-cases can be embedded in the format.

So last point would then be "why OSM-inspired XML over defining our own XML spec or XML". I'll say that YAML is slow to load and these could have a lot of information in it (thousands of waypoints, rooms, etc). It doesn't sound like anyone is strongly championing for YAML, though it would be nice to have everything in the same file format. You did bring up that the BT are XML as well which I didn't think about, so we're already mixed. That resolves a bunch of my issues with that.

@AlexeyMerzlyakov any objections with a OSM-inspired XML given this discussion?

If he's OK with it, I think the next step is to open a WIP PR to make a markdown under docs/design/. Starting with a quick blurb that we can refine later about what is semantic navigation/labeling. What we just finished was the file formats, we should write up a summary of our example support cases (dock, etc) and geometries (points, regions, routes), options of formats considered, why this OSM-inspired XML over the others, and how we will embed our support cases in the file (specific XML examples).

Onto the next line item: masks & non-geometric/discrete annotations (like gradients in a speed zone, an image mask for keep outs, multidirectional lane areas, etc). You guys did the heavy lifting on the XML discussion, we'll do that for this. This discussion is only around the file formats we want to store this stuff in. We'll decide in another bullet on the list how we want to actually use it.

What we propose is using images like the existing maps (pgm, png, anything a typical image loader can load, etc). This allows for a visual inspection and modification of these files in common image editing tools with are currently part of commercial work flows for some of these types of things. It also allows editing in many tools vs forcing users to use specific tools we create that might limit what they want to do with it. Given that folks could want to embed both rough percentages, exact values, or odd shapes, I think it would be good to keep this visual and general. This is in line with the work in designing that Alexey and I have been thinking about and he has begun to implement with the costmap filter tickets for enabling speed zones and keep out zones. Though it would be good if we were able to include annotating this type of thing in the GUI as well. Future topic.

I think from our discussions on directional lanes, it would make more sense to use the XML format to embed the directional graph data with a way tag (which is actually want its intended for). For non-directional hard-constraint lanes, the image may also be an option. So this method of embeding information is only used for spatial information that's relational to the map image itself.

The XML would have tags in the root for add ons to load (like <tag keepout file="/path/to/file.png"/>) whatever the tag we use, we'd make it able to be parsed automatically to find these masks and load them onto a specified topic or into memory or whatever the application wants.

SteveMacenski commented 4 years ago

https://github.com/osrf/rmf_demos You should click on the video and watch it. It looks like OSRF has created some annotation tooling, framework, and integrations. I don't know much more on it right now beyond that video. The documentation is limited and I don't see alot of the code I would have expected to see to make something like this (so either I'm missing something, there are repos not publicly available to recreate, or some of this capability was fudged for a demo).

We may want to consider aligning with this project and making integrations with it if it looks sufficiently mature and this project is going to stick around for awhile. Please take a look and give me your thoughts and I can ping folks at OR with out plans / thoughts and see what they say. There are a few RMF repos under that org so take a quick glance through them.

https://github.com/osrf/traffic_editor https://github.com/osrf/rmf_schedule_visualizer https://github.com/osrf/rmf_core

AlexeyMerzlyakov commented 4 years ago

@AlexeyMerzlyakov any objections with a OSM-inspired XML given this discussion?

No, there are no objections about it. XML is a human readable and widely-used format everyone to know. There are tons of XML parsing/making existing tools everyone can use - parsers: tinyxml/tinyxml2, libxml2, libexpat, pugixml, rapidxml, xerces?, etc...; GUIs: CAM Editor, BaseX; plugins: VEX for Eclipse, Visual Studio itself has a XML tools onboard, many other for vim and emacs. XML already widely used in Navigation2 stack and ROS2 (with tinyxml/tinyxml2). Also, OSM-inspired XML will be more useful than just our homebrew XML because we can utilize OSM -> to OSM-inspired XML conversion in the future to import OSM maps into Navigation2. So, I think we are on agreement there.

AlexeyMerzlyakov commented 4 years ago

Regarding "masks & non-geometric/discrete annotations": we have following use-cases for today:

Keep-out zones
Speed limit zones
Preferred (indirected) lanes
Directed lanes
Costmap & robot move forcing gradients

For first two bullets and also for other zones-related filters that may be - the most convenient format to use is any raster graphics format (be it PGM, PNG, BMP or something else). The main advantages to use raster images over vector shapes describing in XMLs are:

Ability to make any odd shape of zones
Simplicity/Visibility to edit zones masks in any preferable graphics editor (e.g. GIMP)
Algorithmic simplicity for better CPU performance. This point is rather related to question "How?" which is out of scope of current bullet.

Regarding speed limit zones - I see no problem to specify zone' numbers by color + having XML descriptions per each color with its speed limit given in percent or in absolute value.

For lanes and gradients raster and vector formats are both suitable. Raster formats have the same advantages as for zones, but vector formats here are also OK to use until we won't enable odd-shaped lanes or gradients.

Summarizing all points: if we will keep unified mask format for all costmap filters (again, to avoid multi-formats for one "costmap filters" task), it is more reasonable to choose raster images over vector, I think.

shrijitsingh99 commented 4 years ago

So having gone through the repos, here is my takeaway from this solely from a semantic map point of view, this is no way a very thorough analysis and would be great to hear more about the direction of RMF from people working on it.

I have highlighted the cons, @Sarath18 mentions the pros:

Focussed on fleet management which has predefined paths in contrast more focussed on the general navigation problem. We also don't want to bind ourselves with a specific robotics middleware
The format focus elements needed to make a simulation environment and predefined paths not represent complete semantic information exactly
GUI is still pretty basic and early stages of development compared to some of the editors used in OSM
Some features of GUI will not be useful for us and extra stuff needs to be added.
Only predefined objects having respective Gazebo models can be defined
No support for features like masks and zones.
Waypoints, GPS navigation or via point navigation not supported, only navigation support is through predefined paths
Driveable areas are represented as lines with predefined width, cannot specify custom shape for it.
The concept of floors exists but only to define the texture of the flooring, No native support for zones and areas.
Dependency on them for format and new features else we will be essentially just be building a format of their base which can be done with OSM as well.
No unique ids for identifying elements
Pretty new and not very mature (around 7 months it seems), we don't want tie semantic maps to a specific implementation of a robotics middleware

As a robotics middleware I can see the potential benefits of it but besides the GUI and parts of format other things are not that useful. But we should find out more about their progress and direction to get a better idea.

One point I do think we should do is add conversion scripts to their format since our format will most likely be a superset of this. In this way we can support their framework too.

Sarath18 commented 4 years ago

How I look at traffic editor is, it has great potential and some of its features do overlap with our current interests. Since it's fairly new we can try combining both the projects or have some kind of cross-compatibility support which is in benefit of both. Some key features of using traffic_editor include:

Aligns with some of our goals
Has good editor as a base to start with.
Format reuses point definition like OSM
Supports main features which we also want to support like walls, paths, objects, elevators etc.
Similar floorplan idea and multilevel support.
Uses Qt and YAML. These tools are already in the ecosystem.
Provides the ability to create maps by drawing on top of images.
Simulation can be loaded/generated directly from the map data
Integrated with RMF which could be deployed for robot fleets.

I agree with @shrijitsingh99 on a few points. It might not be a good middleware to use for semantic maps but has a great editor that we can build upon. For our use case, I think we need the ability to assign a tag to any element in the map instead of adding predefined object. Hence, the GUI lacks entity description support.

I would like to describe more on the traffic editor GUI when we start our discussion on it.

SteveMacenski commented 4 years ago

So I looked at it from the stand point of their tooling (GUI) and standards of data formats. The other stuff seems multi-robot specific and potentially we make some integrations there down the road. For now using some format that another ROS project in mobile-robotics-land could be useful. Especially if they're looking to use the nav stack for real-world demos. Users could use a single file for both sets if they wanted to, that seems powerful to me. Plus aligning in open-source is a force multiplier, more hands and eyes on a thing to debug and develop.

On Cons: focus on fleet management, masks, etc - totally understand. They do some stuff we don't need right now and they don't support stuff we do. But we can work off that or merge in those updates we need. If they can support lines, points, and areas, those are the primatives we need. They use "line"-like things to describe paths to follow for their demos, but that doesn't mean there isn't the concept of a point in their specification we can add names to or other attributes to do navigation. Looking a little past just the demo of capabilities they show, they are defining areas / lines (and points?) to do "stuff" through a GUI. Same with us. What those things are and what they're being used for are different, I agree.

Only predefined objects having respective Gazebo models can be defined

Now that's a real reason to potentially not use it - so you can't just draw shapes on a map, they have to be part of gazebo models? Can we not work with a map or do we have to work with a simulation world?

On pros: it sounds then like they can do points, so that means we have basically the same starting point as OSM for our "OSM-inspired" format.

Provides the ability to create maps by drawing on top of images.

Also sounds like we can work with just image files.

It might not be a good middleware to use for semantic maps but has a great editor that we can build upon.

I'm confused why its not a good thing to potentially use their standards for file formats / GUI. It sounds like they have alot of the capabilities we're looking for and enabled in ROS2 already. I'm not suggesting we use their fleet tools, simulation, or multi robot stuff, I'm just looking at their map-editing tools and formats. It seems like they have similar semantic information, just aimed at a different goal.

@codebot (Morgan) I believe is leading this effort out of the OR Singapore office. Maybe he can share some roadmap or his thoughts. I'm also a little confused as to why this is under OSRF and not a ros-* org, but I don't know the history of this project and if this intended for long-term support / adoption.

shrijitsingh99 commented 4 years ago

Plus aligning in open-source is a force multiplier, more hands and eyes on a thing to debug and develop.

Combining development is one of the most appealing points, this will help make development faster and maintenance easier.

Users could use a single file for both sets if they wanted to, that seems powerful to me.

I agree with this, but it might have to tread carefully here and not tie in users to a specific platform only. Like currently the define Gazebo models for objects making this instantly incompatible with any project that uses a simulator besides Gazebo.

If they can support lines, points, and areas, those are the primatives we need. They use "line"-like things to describe paths to follow for their demos, but that doesn't mean there isn't the concept of a point in their specification we can add names to or other attributes to do navigation.

This is a good point, while they do use points, lines, and areas it is not generalized these concepts are tied to specific features. Lines do not exist independently they exist in the form of edges of a graph used to represent paths. Areas don't exist independently, instead, they are tied into floors, which are used to give textures to flooring.

Looking a little past just the demo of capabilities they show, they are defining areas / lines (and points?) to do "stuff" through a GUI. Same with us. What those things are and what they're being used for are different, I agree.

Defining those things then using separately and then interpreting their meaning separately is a good idea. That's what OSM was good at defining an area or line then separately defining its semantic meaning. This allowed reused of geometric properties, an area could be a zone, mask or wall all at the same time. Here as I see besides points which are independent, lines are tied into paths, and areas are tied into floors with no possible reuse of lines and areas for different purposes. OSM's separation of geometric and semantic information was the major point which appealed to me

Now that's a real reason to potentially not use it - so you can't just draw shapes on a map, they have to be part of gazebo models? Can we not work with a map or do we have to work with a simulation world?

From the demo's it seems as if you can only draw walls as areas. Objects have to be part of Gazebo models since, the only reference to model names annd its co-ordinate is stored, no shape information is stored.

On pros: it sounds then like they can do points, so that means we have basically the same starting point as OSM for our "OSM-inspired" format.

Yeah, true for points not entirely other stuff like areas and lines.

It might not be a good middleware to use for semantic maps but has a great editor that we can build upon.

Sorry for the typo, I meant it might be a good middleware with a great editor but lacks lot of semantic information embedding.

From what I see the current state of their format is a subset of our needs. I believe starting base from a subset is not that a good idea since some fundamental changes might be required which can break their current implementation or will need hacks to make compatible with current implementation.

If we need to add support for their format we can create a conversion script which will not be that hard. Our maybe combined development efforts and migrate their current format to a more generalized format.

@codebot (Morgan) I believe is leading this effort out of the OR Singapore office. Maybe he can share some roadmap or his thoughts.

Sounds good, better to hear their roadmap and stance on this before taking it forward.

codebot commented 4 years ago

Greetings! Wow this is a long discussion and I'm not quite sure how to start responding, so I'll just ramble for a while and hopefully we can drill down to what's most relevant.

In general, osrf/traffic-editor is a new and rapidly-evolving project. Because it's been quite fluid thus far and is still rapidly moving, we didn't want to use a ros-* organization in its early days without first having some notion of "community consensus." The multi-robot and especially the "multi-fleet" worlds are still quite new and fuzzy, and we're iterating a lot in the quest for a reasonable set of abstractions and tools. There is plenty of space for more than one approach. Maybe it could migrate to a ros-* organization someday :man_shrugging: I haven't thought much about that.

I don't think any of us have strong feelings about the file format. We just started using YAML because it's trivial in Python, and it's easy to hand-edit when a particular GUI feature isn't quite working yet. Migrating to XML or GeoJSON or anything else would be totally fine. It's just a structured file of some sort that points out to various PNG images in relative paths. Internally things are represented as vertices and then lists of vertices to form lines and polygons. That has pros (easy to drag a single vertex and have it automatically "stretch" all connected lines/polygons) and cons (more bookkeeping and opportunities for :bug: since vertex-indices are everywhere) and I don't feel strongly about it either way anymore. I think GeoJSON uses explicit coordinates rather than vertex-index lists, and I can see why. Currently, traffic-editor annotations are in pixel units, rather than meters. I realize that is maybe controversial, but it's convenient to annotate the rasterized image and then worry about scale/orientation later. I'm sure many other approaches are workable as well. Pixel units was just the easiest way to get started as we evolve the required set of features and tools.

We needed a graphical editor for the RoMi-H project, formerly known as RMF project, which aims to integrate multi-fleet robot operations in shared spaces in large buildings. To do this, we need a way to reason about the navigation graphs of the robot fleets already operating in the building in the same GUI, as well as a way to export compatible navigation graphs for additional "new" robots to enter the traffic flow coherently. For example, in a particular corridor, we need to agree that robots will drive on the left side, etc. The input to this process is a set of architectural drawings of the building. We rasterize these "golden" drawings into PNG's, and then either import existing navigation graphs (using various scripts to consume navigation graphs from proprietary robot fleet formats) or manually draw compatible new graphs using the lane-drawing tools in the editor.

Summary of the current workflow:

start with a pile of PNG architectural drawings. Assign them to floors with convenient names like L2, B1, etc.
identify features that go through all floors, like corners or pillars. Annotate them as "fiducials" that are used to align and scale each PNG when you click between floors, so clicking the floor name just "teleports" you vertically up and down the building. This is helpful because sometimes the drawings are aligned to different axes, and the source images often have different scales (meters per pixel).
label vertices and optionally name them. For example, a parking spot for a delivery/pickup.
draw traffic lanes between vertices. These can be unidirectional or bidirectional. I realize most robots are capable of free-space navigation, but as the robot and human density increases, having some sort of "guidelines" to the traffic flow becomes important to prevent chaos. Same reason we have lane markings on streets :smile: Currently traffic lanes are only straight-line segments that are 1-meter wide. Obviously there are tons of ways to improve that definition, and we should!
overlay robot-generated maps (PNG images). Unfortunately these have to be localized manually (translate/rotate/scale), but this can help identify where the usually-static obstacles are (tables, desks, chairs, random stuff, etc.) as well as identify any minor differences between the architectural plans and as-built reality.
optionally, for simulation, you can also annotate the drawing to trace the walls and drop in pre-defined Gazebo models. Then a separate "building map generator" (in Python) can "inflate" the project into a Gazebo world, extruding walls, instantiating motorized-door plugins, furniture models, etc. This is super helpful for developing the planning and control software to open/close automatic doors, summon elevators, etc., with all of the usual motivation for why simulation is awesome. For large buildings it's even more important to do simulation testing for many reasons, including 1) there are tons of subsystems involved, everything is more elaborate and time consuming to start/stop/reset 2) real-world testing is physically difficult in large areas since robots drive slowly, etc. 3) the building is in operation, you're probably not going to get exclusive access, and you really really want to have done many many many kilometers of simulation testing first, since debugging with an audience is not fun. The polygon editor is currently only used to define floors for Gazebo, but we're hoping to have it do many more things in the future, such as defining keep-out zones or other types of navigation zones, so that traffic lanes don't have to be only constant-width lines. Lots of ways to do that.

Anyway that's just the offline editor. The rmf-core package collection is much more complex; it takes the output of the editor (the "navigation graphs") and data feeds from all the robot fleets. It can dispatch tasks and reason in real-time about impending multi-fleet conflicts, trying to figure out ways they can be mitigated, such as diverting, holding, or temporarily parking robots, and so on. We're working on web-based GUI's for operators and a bunch of other things to extend this further. Because commercial robots often have proprietary fleet managers deployed with them, rmf-core has "adapters" that talk to the proprietary fleet manager API's (many of which are not public) and translates them to/from a common set of messages. We also support "loose" robots via a thing we're calling FreeFleet, which can connect "loose" individual-robot ROS nav stacks to the same higher-level "common fleet messages" that rmf-core uses, using a WiFi-friendly configuration of DDS. The building resources such as automatic doors, elevators, etc., are connected to rmf-core using a similar concept of "adapters," which typically have to be customized/configured for the API of each vendor/model, and are often protected by NDA. By having all of these "adapters" at the periphery of the system, the core can remain open-source and generic, communicating using a common set of open messages.

The ticket traffic on the various rmf_core packages hopefully provides some guidance about current fronts of development for the subsystems. Run-time visualization for building operators is being developed here Visualization of internal data structures is being developed here

All of the things I mentioned above are under heavy development. For the offline traffic-editor, we're working towards various ways to define motions and behaviors of the non-robotic aspects of building operations. Of course this depends on the domain, but in many scenarios, there are far far far more humans than robots. Realistic simulation of the multi-fleet traffic in Gazebo would be improved if it had reasonable models of the human traffic that can delay the robot traffic. Due to real-world constraints, this likely needs to be done in closed-source plugins, because many building owners won't want their operations details to be public. So we're creating a minimal "process simulation" or "process emulation" plugin interface (via ignition-plugin) where you can write closed-source plugins that get a tick() call and move the non-robot models around. We also want to be able to annotate many more things, such as speed limits on various traffic-lane segments (often robots go much slower in elevator lobbies, etc.), change the width of traffic lanes for different fleets (for small/large robots), and generally improve the annotation workflow.

Other future work could include dealing more gracefully and explicitly with robot fleets capable of free-space navigation, by adding polygons to define "ok" or "preferred" regions, and so on. The wish-list is sort of endless :smile: it's a matter of prioritizing the "hmm this would be awesome" type of things and matching to development time and resources.

I believe that everything used to generate the videos in rmf_demos should already be available as open-source. If you come across blocks you can't find, let me know, but I think it's already available. Our goal is to have as much of this as possible in the open. As always, documentation lags behind features. "The documentation could be improved" is an always-true statement. :writing_hand:

Well, this monologue got embarrassingly long, but hopefully this provides some context about the origins of traffic_editor. We would love to adapt this tool to make it more useful for more workflows involving map annotation, no matter how many robots are involved! I know this is a big-picture thread, but if there emerge specific questions about traffic-editor, feature wish list, etc., we can fork them into smaller-scope tickets in osrf/traffic_editor

shrijitsingh99 commented 4 years ago

Hey, thanks for the detailed overview of the project. It has clarified a lot of stuff.

Well, this monologue got embarrassingly long

Hahahaha, I think this is in line with all the posts in this thread 😛

From the above post, I can see a lot of our vision and use case is aligned so it will be very beneficial to work on something common. Many of the stuff you mentioned you also planned for we also have mentioned them in the above thread.

Considering, all of this my viewpoint on this. We build an OSM-inspired format as previously discussed since it will be more general-purpose and not application-specific. We can carefully design it so as to be as modular and extensible for any project possible. We then integrate all the required functionality of traffic_editor such as supporting models, fiducials, etc. into the format too which would be beneficial to people who decide to go ahead and use RMF as the middleware and Nav2 as the navigation framework. This would also allow us to combine our efforts in maintaining and developing this new standard.

Coming to the GUI editor, whether to fork from traffic_editor or not, we can have the discussion later according to the agenda we discussed.

In this way, we can satisfy the needs of both parties and be extensible to anyone else who wants to use our standard. Any drawbacks with this approach?

Yadunund commented 4 years ago

https://github.com/osrf/rmf_demos You should click on the video and watch it. It looks like OSRF has created some annotation tooling, framework, and integrations. I don't know much more on it right now beyond that video. The documentation is limited and I don't see alot of the code I would have expected to see to make something like this (so either I'm missing something, there are repos not publicly available to recreate, or some of this capability was fudged for a demo).

@SteveMacenski the rmf_demos repository mainly houses assets, maps, gazebo plugins and launch scripts required to demonstrate RMF in simulation. The code powering everything is maintained in rmf_core. The Readme has all the commands to recreate to recreate everything seen in the video. But yes, better documentation is needed :smiley:

SteveMacenski commented 4 years ago

ahhhhhhhhh so many words

My brain looking at this.

Like currently the define Gazebo models for objects making this instantly incompatible with any project that uses a simulator besides Gazebo.

If that's actually true, yes, that's a deal breaker, can you show me evidence of that? Morgan's analysis makes me think that this is more having to do with lidar maps.

Greetings

Salutations, fellow mobile robot lover.

The multi-robot and especially the "multi-fleet" worlds are still quite new and fuzzy

Tell me about it. there's a few tickets in here to support tools for multi-robot XYZ tasks. its not even that those algorithms are particularly challenging, its just that there's not really a general way that people format multi-robot systems. Having N robots on a given ROS domain (even in ROS2) seems like a pipe dream and a little impractical (e.g. I don't want robot 1 to have traffic taking up bandwidth about robot 39's telemetry, I just want to know its pose, speed, and plan. Further, especially in ROS2, things get a little wacky with >100 participants which you hit really fast with just a small number of robots). In my experience, having the robots talk to the cloud or a central location and then have some derivative information about its neighbors sent back to it works much better. That cloud comms could now be done with just ROS2, but also practically speaking, there are distributed data bases or MQs better suited for the task. The question then becomes: how do we generalize this if everyone's using a different / home built method for robot 39 -> cloud -> robot 1 about the 50 robots information around it that are relevant. I think defining some topics like robots/information that we assume is implemented to use that is a cheap cop out, but seems to be the most practical generalization I can think of. Anyhow, this is off topic.

Currently, traffic-editor annotations are in pixel units, rather than meters.

Omph, yeah, not great in my view. This doesn't generalize then to any other method of navigation other than 2D lidar maps. That would be a deal-breaker for me as I'm moving the navigation2 stack further along the technology curve to make sure of 3D lidars and visual slam. Cartesian coordinates and GPS coordinates are minimum viable product requirements in my view.

I don't think any of us have strong feelings about the file format

Agreed, I'm looking for the format that gets us as close to our goal as possible and then working from there. OSM seems to be a good standard and there's alot of heritage reasons to use it. However, if you guys are around using something else and you're in our metaphorical backyard, it makes alot of sense for us to work together on a format that works between our systems. This is all assuming that RMF is "here to stay" and is going to be given attention over the long-term. There's alot of value in interoperability (and for yal on the RMF side to test using our side / eventual hardware navigation integrations). It would also seem really stupid to me if two important projects in ROS decided to use 2 different systems for embedding more or less the same types of information (lines, points, areas), even if for slightly different motivations.

draw traffic lanes between vertices

This is where RMF loses me a little bit. This seems to be technology available circa 30 years ago. Though I definitely understand the multirobot free space coordination is more of a fun robotics problem and probably not a good strategy for executing functional work under a contract. I do think though doing it this way, you'll hit a wall with the number of robots you can functionally support. I don't think you'll hit it for a health care application because its a not a facility built on robots, but you'd easily hit that in a manufacturing or warehouse environment. To make this generic, I think eventually that will have to go, or be an option.

All of the things I mentioned above are under heavy development

Absolutely. While your end goals vary from ours in the multi-robot aspect. The tooling for annotation and the formats of those annotations are places where we can have overlap. Maybe we support different configurations so that buttons on the screen mean different things (a dock or waypoint vs a door or path, etc). This is something I am very open to if there's cross support from your side to helping make that happen. A rising tide lifts all ships.

by adding polygons to define "ok" or "preferred" regions, and so on. The wish-list is sort of endless

See, but that's in our immediate roadmap to add. So there's definitely some overlap here for an active collaboration. I think the big things are to make the formats and GUI supportive of both of our needs and then come up with some reasonable understandings of where different features should live. Our current model for ok / preferred regions, speed reduction zones, etc are to implement this in costmaps. If that's in line with your thinking or had other thoughts, that would be valuable to know. I don't think any of us really mind about the plugins / closed source APIs / etc because those things aren't really our goals. I think of things like elevator APIs and how to open doors automatically are "use-case specific" and I don't concern myself with them. I'm more concerned with how to support those use-cases so that people can do those things in our framework (e.g. support the annotations of them, support BT plugins that can call external services to things like a door-opener, support autonomy configurability in a behavior tree to "wait" in front of a door at a distance, doing live and well documented demos of that tech so that there's no excuses for thinking otherwise, etc). All of those things could be done through a central agent, but behavior trees are extremely well suited to making concerns like those distributed to the agents themselves. I'm of the school of thought that robots should be capable of any action required for their successful navigation to a goal, but the decision of what that goal is and the route to get there can be given from a central authority.

The option behind door number 3 is that we actually scrap the current formats you use and we work together to implement a new one that we both use based on OSM. This would add in your experience about what works and your needs, and add in our use-cases to ensure its sufficiently flexible to model any practical mobile robot annotations. A key point of this would be to support the 3 coordinates: GPS, Cartesian, and pixel. It may be also a chance for us both to reduce our technical debt with a fresher start and make a formal REP standard for this information. I'm definitely not "Mr. Formalization" but I think its required if we're all going to play in the same sandbox.

PS: there's a navigation2 slack group (navigation2.slack.com) that you and your colleagues are welcome to join. I've been working hard to get more people involved the last few weeks and we're seeing a nice uptick in traffic.

codebot commented 4 years ago

It sounds like we have tons of overlap in terms of high-level goals and the desire for open-source tooling to help annotate maps. I think everything else is relatively minor and can be worked around or compromised in various ways. My general response is that traffic-editor is under heavy development and everything can be changed, so I'd hope that nothing feels like a "deal breaker," but rather just a higher priority item for redesign and redevelopment :smile: I'm not particularly attached to any individual aspect of the program and certainly not it's implementation :rofl: it was my first "larger-scale" C++ Qt effort in a long time, and I did lots of things wrong. I think it's fixable, though, with a few fairly major things that need to be improved via chainsaw. Or if nothing else, hopefully it's at least a working end-to-end "Rev 1" implementation that can be used for copy-paste fodder for next-generation-editor.

Like currently the define Gazebo models for objects making this instantly incompatible with any project that uses a simulator besides Gazebo.

Models are currently defined in the file simply as a "model name" (for example, OfficeChairBlack, an "instance name" (for example, chair1, if it needs to be unique), and an {x, y, z, yaw} position on the level, like this: https://github.com/osrf/rmf_demos/blob/master/rmf_demo_maps/maps/office/office.building.yaml#L50

The GUI uses the "model name" to find the corresponding thumbnail image and scale/rotate it in the GUI. This is just for convenience, because some models have a symmetric bounding box and it's convenient to see a "bird's-eye" view when placing models. But it's not necessarily tied to gazebo. From traffic-editor's perspective, it's just a PNG thumbnail. For convenience, because we do happen to use Gazebo/Ignition, we have an offline script you can run to auto-generate nicely-cropped and transparent-background PNG thumbnails from Gazebo models. That's how this directory is auto-generated: https://github.com/osrf/traffic_editor/tree/master/traffic_editor/thumbnails/images/cropped but traffic-editor has no idea where the thumbnails come from, or what happens "downstream" in the simulation-world generator script, which is a separate ROS package with no shared code (it's currently implemented in Python).

The polygon-editing machinery could be extended to allow creation of arbitrary extruded-polygon obstacles rather than using only thumbnails for them during annotation. That would be fairly straightforward. In our particular use case, we haven't (yet) needed that, since our "golden" floorplan images are typically coming from architectural drawings, not robot LIDAR maps, and because we really need usable simulation models from all these maps, and simulation just looks nicer when it has 3d models created by technical artists, rather than extruded polygons. But I can see how it would be useful to sometimes just draw obstacle bounding-boxes on top of maps, rather than always using Gazebo thumbnails to mark things.

Currently, traffic-editor annotations are in pixel units, rather than meters.

Omph, yeah, not great in my view.

I'm happy to switch to metric units. We're need to start this process quite soon anyway (with GPS tags for at least one location in the Cartesian plane), for unrelated reasons. Pixel units were the fastest way to get off the ground, but we're running up against issues with it already.

This is where RMF loses me a little bit. This seems to be technology available circa 30 years ago. Though I definitely understand the multirobot free space coordination is more of a fun robotics problem and probably not a good strategy for executing functional work under a contract. I do think though doing it this way, you'll hit a wall with the number of robots you can functionally support. I don't think you'll hit it for a health care application because its a not a facility built on robots, but you'd easily hit that in a manufacturing or warehouse environment. To make this generic, I think eventually that will have to go, or be an option.

It's the current commercial reality for some key robot fleets that we need to interoperate with, so we have to at least support it. As simple as it sounds, it actually works quite well and has many benefits (more tractable to make a provably-correct implementation, easy for humans working in the same space to understand and predict the robot's behavior, etc.) However, as you mention, many robot fleets are moving beyond this paradigm now, so supporting "polygon-based navigation zones" rather than just line-segment "traffic lanes" is definitely something we also need to do. It depends a lot on the robot, but many commercial robots can "favor" following a traffic lane, but are able to deviate from it "somewhat" if the traffic lane is blocked. This is a nice compromise between throughput and maintaining some order to the traffic flow. In general total free-space motion planning would be awesome, but it gets impressively complex when many robots are involved, with many potential sources of delay due to human traffic. I could be quite wrong, but I would assume that the highest density robot traffic (i.e. "shoulder to shoulder" robot density) is achieved by one-way navigation on agreed-upon lanes, like an urban city road network. This often arises "naturally" anyway from industrial shelving patterns and manufacturing lines. Otherwise you'd have to either have very accurate time synchronization, or very very good robot-to-robot negotiation with some priority scheme, to avoid ending up with a traffic jam and gridlock. That's just my intuition anyway. I'd love to be wrong, because high-density free-space motion would be amazing to watch.

All of those things could be done through a central agent, but behavior trees are extremely well suited to making concerns like those distributed to the agents themselves. I'm of the school of thought that robots should be capable of any action required for their successful navigation to a goal, but the decision of what that goal is and the route to get there can be given from a central authority.

Yeah... intuitively I agree, but things get complex when operating in a shared space with shared resources like elevators, doors, and narrow corridors where robots have to "take turns" with other robots that are beyond sensor range. Humans have a complex social dance about "who gets priority" when you're trying to go into an elevator and unexpectedly encounter someone coming out it, or when you almost run into someone along a long narrow corridor with a sharp corner in it, but "retracing" your motion plan to move out of the way of another suddenly-appearing robot is tricky, and can cascade into other motion plans of closely-following robots, etc. It gets complex really quickly if you try to do it fully distributed with many robots.

The option behind door number 3 is that we actually scrap the current formats you use and we work together to implement a new one that we both use based on OSM. This would add in your experience about what works and your needs, and add in our use-cases to ensure its sufficiently flexible to model any practical mobile robot annotations. A key point of this would be to support the 3 coordinates: GPS, Cartesian, and pixel. It may be also a chance for us both to reduce our technical debt with a fresher start and make a formal REP standard for this information. I'm definitely not "Mr. Formalization" but I think its required if we're all going to play in the same sandbox.

This approach sounds great to me. I'm looking forward to moving away from pixel-based coordinates and towards a single Cartesian frame for the whole building map, with a single (for now) annotation that locates the origin of the local Cartesian frame in the global lat/lon system. This just became a requirement for us, so we're very motivated to do it :smile: While doing that, we might as well migrate the file formats to something that best suits both of our goals, since we'll need to be writing migration scripts on our side anyway. I am entirely neutral on the container format (xml, yaml, json, etc.).

I'll join the slack channel right now. Thanks for setting it up!

SteveMacenski commented 4 years ago

It sounds like we have tons of overlap in terms of high-level goals and the desire for open-source tooling to help annotate maps. I think everything else is relatively minor

Agreed, it looks like a good opportunity to also bring two tangentially related things in sync.

The GUI uses the "model name" simulation just looks nicer when it has 3d models

In the real-world, its not a model, its just a chair :smile:. Unless you live in the matrix, then proceed. I think there are some assumptions (or just nomenclature for trying to get something working quickly) here about simulation that I think we'll have to break. I don't know that many practical applications have 3D simulation environments of all of their customer sites.

In general total free-space motion planning would be awesome, but it gets impressively complex when many robots are involved, with many potential sources of delay due to human traffic.

Agreed, I think its really dependent on alot of factors. I actually don't think its as complex as you think as long as there's the running assumption that this computation is being run on a cloud server and not on each agent. Distributing that logic would be intractable with > 20 agents I suspect. That's just intuition, I have no experience in multi-robot-ing like that.

I could be quite wrong, but I would assume that the highest density robot traffic (i.e. "shoulder to shoulder" robot density) is achieved by one-way navigation on agreed-upon lanes, like an urban city road network.

For something like a Kiva I'd agree. For heterogeneous fleets with wildly varying maximum speeds and varying priority levels of individual agents, I think that would break down. This all assumes a "no humans allowed" facility. Anyway, we can definitely debate these merits in Slack in a PM thread if you like. These are interesting problems to think about.

shared resources like elevators

Ah, you sold me on elevators. I think doors would be fine with multiple agents (if robot A opens the door, robot B can just go through after if all N robots know the current actions of their neighbors). Maybe there are some resource better suited for the cloud vs local control, but by the time you go cloud, just go cloud.

@shrijitsingh99 can you get that document together and we can share it with Morgan and go from there? Maybe post a google doc on slack we can work from and comment in and then translate that into the markdown file for our design folder? I'm sure he'll have some changes or ideas, but at least that way we lob out a proposal to work from rather than starting this conversation from scratch.

I think the terminal goal of the format work is

REP XYZ for semantic labeling in maps
A design doc for navigation in our use of that REP with working examples that don't belong in the REP
Some shared repo for tools related to the format (I/O, GUI, header files for working with whatever data structure contains the information, srv / msgs files for requests, etc)

Any objections?

Just to keep this going while we're waiting on that... @shrijitsingh99 comments on my proposal for masks?

My summary of feedback from that proposal

Alexey approved at a high level
Alexey added that we could add maps between pixel values in the image with XML to other values. I think that would help with the general specification. I also think with floating 32FC3 (32 bit floating point 3 channel images) we should have enough information to not require it and the values in the images are interpreted by their end uses. That could be absolute values, relative %, things that make spatial sense for representation. I can't come up with example where I'd want to do a bitwise mask on the image given 3 channels of information potentially and then need to remap from XML. Can you provide an example?

(I could really get used to this PM thing, way more effective than trying to sit here and do this all myself! Thanks for all of your hard work everyone and your diligence.)

shrijitsingh99 commented 4 years ago

@shrijitsingh99 can you get that document together and we can share it with Morgan and go from there?

Sure, I will get a basic draft ready and share it on Slack.

I also think with floating 32FC3 (32 bit floating point 3 channel images) we should have enough information to not require it and the values in the images are interpreted by their end uses. That could be absolute values, relative %, things that make spatial sense for representation. I can't come up with an example where I'd want to do a bitwise mask on the image given 3 channels of information potentially and then need to remap from XML.

Could you explain what you mean by this, I am not quite sure I understood what you meant. I am slightly confused by what you mean by when you say external mask images @AlexeyMerzlyakov @SteveMacenski . I am assuming you mean something like what @Sarath18 has mentioned in the below post.

So, in my view having both masks defined as external images (raster graphics) and gradients embedded in XML itself (somewhat like this) are useful.

The advantage with the embedding approach is that for simple shapes it's easier to define as compared to an external image mask, for which you will have to open GIMP then create the mask. Since many cases will be having just simple shapes I imagine, so having the embedded approach will make things more convenient for the user and easier to modify since the properties of embeddings can be changed to change the gradient, but for changing the gradient in an external mask image will require you to again open GIMP.

External mask images also will end up creating a large number of mask images scattered throughout the directory, which I feel is messy.

Coming to the good points of externally image masks. You can define arbitrary shapes and arbitrary gradients. More complex tasks can be represented and same mask can be reused multiple places very efficiently.

So having both approaches might be the best way to go about tackling masks.

Sarath18 commented 4 years ago

So this is how we will define a mask when using raster graphics:

<node id="213" x="5.0" y="10.0" />
<node id="214" x="5.0" y="5.0" />
<node id="215" x="10.0" y="5.0" />
<node id="216" x="10.0" y="10.0" />

<way id="352">
    <nd ref="213" />
    <nd ref="214" />
    <nd ref="215" />
    <nd ref="216" />
    <nd ref="213" />
</way>

<relation>
    <member type="way" ref="352"/>
    <tag k="type" v="zone:SpeedZone" />
    <tag k="mask:type" v="FileBased" />
    <tag k="mask:file" v="mask.png"/>
    <tag k="mask:file:channel" v="r" />
</relation>

<relation>
    <member type="way" ref="352"/>
    <tag k="type" v="zone:DirectionalZone" />
    <tag k="mask:type" v="FileBased" />
    <tag k="mask:file" v="mask.png"/>
    <tag k="mask:file:channel" v="b" />
</relation>

We define an area/zone using a way which is fundamentally a set of nodes.

Suppose we now have an irregularly-shaped zone mask rasterized into an image file (png, svg) as shown in the figure. This image contains two types of masks embedded into two channels of the image.

Red: Speed mask
Blue: Direction mask

Masks that will be created will be of the size of the minimum bounding box of the irregular image. It is implicitly implied that the order in which the way is defined will be the order in which the corners of the images are assigned to the node.

This implementation allows defining irregular masks in a clean way along with the ability to skew the mask just by changing the position of the nodes.

SteveMacenski commented 4 years ago

A mask meaning a bitwise overlay of the map with some information. Whether that's auto generated from a AI / automated tool or manually annotated from a person in some image editing program. This is how we've talked about implementing the speed zones and other spatially coherent features that aren't well described by a line or outline. This is the costmap filters work. XML does not make sense to represent in these cases.

The advantage ... for simple shapes ...

That's the problem, we're not talking about simple shapes. For simple shapes sure, but that's also opening a can of worms that I don't think we should address, or not at least in the initial implementation.

I was not going to cover the use of images that aren't bitwise masks of the map in the initial standard. That seems to add alot of additional complexity, but I suppose its going to be necessary if we want multiple submap sources. I'm not sure I follow the practicality of your specific example Sarath, but I understand the intent. Having masks that are smaller shapes than the map size itself and locating them over the map. That would fall under the XML standard to describe. My recommendation would be to have the XML only have the ability to describe smaller masks, some metadata, and a position / orientation for them within the larger scope. I would not try to generalize the channels of the image for specific things, that's use-case specific. The client of the semantic work is responsible for interpreting the files contents / meaning.

shrijitsingh99 commented 4 years ago

A mask meaning a bitwise overlay of the map with some information.

Understood. I assume we can have multiple masks defined, so will we assign some sort of label to each mask-like SpeedMask and 'KeepOutMask?

That's the problem, we're not talking about simple shapes. For simple shapes sure, but that's also opening a can of worms that I don't think we should address, or not at least in the initial implementation.

Yeah, there are some good use cases for this, such as dynamically enabling or disabling parts of a mask, etc. but I agree it does add a layer of complexity so might be a good idea to defer it for future iterations, but we should keep this in mind while designing the spec so that it can be easily accommodated in future iterations.

We are more or less on the same page on masks now.

Alexey added that we could add maps between pixel values in the image with XML to other values.

This is one remaining point that needs to be discussed with @AlexeyMerzlyakov

Sarath18 commented 4 years ago

I was not going to cover the use of images that aren't bitwise masks of the map in the initial standard. That seems to add alot of additional complexity, but I suppose its going to be necessary if we want multiple submap sources.

Yes, I agree with you. We should consider this in the initial standard itself.

My recommendation would be to have the XML only have the ability to describe smaller masks, some metadata, and a position/orientation for them within the larger scope. I would not try to generalize the channels of the image for specific things, that's use-case specific

Talking in terms of Computer Graphics terms, we can think of all the nodes as vertices for a quadrilateral and the mask being a texture for that quad. Each vertex is assigned a texture coordinate in the same order in which the Nodes are referenced when creating the Way.

This implementation allows defining irregular masks in a clean way along with the ability to skew the mask just by changing the position of the nodes.

Whatever might be the shape of the mask internally, only 4 nodes are required to define the irregular mask. These 4 nodes, in most cases, will be the same which were used to define ways area or some other primitive type in that region and can be reused. This might turn out to be really helpful in future. Defining just position and orientation will not help in skewing the mask/texture.

Stretch Goals One more idea that I had is to combine all these submasks and create a mask atlas (something similar to a sprite sheet). This can help define multiple masks in one single image. Only one single image consisting of all the masks is to be loaded instead of multiple submasks.

AlexeyMerzlyakov commented 4 years ago

Regarding embedding of shaped pictures into XML - it undoubtedly good idea. I see this might be useful for multi-story maps buildings with each floor could a bit differ from another where basic replication feature to be highly demanded to avoid routine work. Also, I see the application for big warehouses or offices with celled structure. One mask could be applied and repeated per each cell there. However, do we really need to overcomplicate the design with skewing, shearing and scaling of masks? The main question here in possible application of such features. If there is a change to become a widely-used sought-after feature, we need to try. But I think, specifying position and orientation of raster mask will be enough to cover the use-cases concerned with replication. As Steve told before, the costmap works as spatially coherent applying to masks as well. I do not think we need to make spatial-dependent feature to be fully vector-based.

The same for sprite sheet use-case: this feature is in demand of the world of raster graphics. But I am not 100% sure that will be applied to the masks world. The fact is that each costmap filter mask is specific to its type of building and concrete task where it being developed. I am doubting that we can provide some type of universal mask atlas that will be used widely in robotics in different kind of buildings/ each own navigation stack / varied goals and areas of applications / etc...

So, in my view having both masks defined as external images (raster graphics) and gradients embedded in XML itself (somewhat like this) are useful.

Having the same feature covering twice: by raster images and vector graphics (embedded in XML) is not a good idea since it will make the design and both implementation to be really hairy in the future. I suppose to move straight: for sematic types of objects to have a vector-based descriptions and for spatial types of objects - to have a spatial-dependent masks. They all will be connected in XML as we previously discussed.

Alexey added that we could add maps between pixel values in the image with XML to other values. This is one remaining point that needs to be discussed with @AlexeyMerzlyakov

This is related to cases where speed of robot should be restricted. There are two ways of restriction: one - is to restrict speed limit in % robot maximum speed. Another - is to specify absolute value in m/s. Let's take a look at second way. There speed of robot is a floating-point value that could be encoded in a RGB using RGB each having 8bit information, summarizing them and making 24-bit float. This type of representation has the limitation in visibility of developer. Therefor we have another option - to make PGM (or any other) format coded by its color. For example, the following PGM-file with size 4x4:

And XML, something like:

<speed_limit_mask id="245">
    <file="/tmp/my_filter.pgm" />
    <key="2" val="0.55" />
    <key="5" val="1.5" />
</speed_limit_mask>

means that area marked with color "2" in PGM will have max.speed limit 0.55 m/s, in area marked with color "5" in will have max.speed limit 1.5 m/s.

SteveMacenski commented 4 years ago

However, do we really need to overcomplicate the design with skewing, shearing and scaling of masks? My intuition is no, homogeneous transformations only.

I do not want to represent gradients or any form of spatial information in XML for the same reasons Alexey mentions. Those can be represented with masks that can be positioned onto the map.

@AlexeyMerzlyakov you can have floating point images. 0.55 m/s can just be 0.55. But that's fine to also add to the spec.

In summary:

Masks for full map for something like a speed one or keep out supported
Smaller masks for portions of the map to be supported, with homogeneous transformations to position them
XML grouping for image files, some key to value mapping, relative position / orientation, and mask type.
No spec for the specific encoding of information / channels; up to application using the encoded key-values, file path, and pose to decrypt needs since there's infinite image formats and requirements. Use case will use the list of masks available from the XML file to select the one it wants (some metadata tag called speed_zone or something) to load file and translate information.

Anything else to add / objections? @codebot thoughts on the masks part of the spec?

shrijitsingh99 commented 4 years ago

As such I propose we discuss the following, in this order

[x] File formats for points, lines, and areas: YAML, XML, OSM-XML, or other

[ ] What we want out of masks and how to embed that information: pgm/png, XML, or other

[ ] A couple potential designs for multi-floor mapping to come to the consensus on how it makes most sense to represent the multiple maps and the "gateways" between them. The goal of this will be only to determine if the multi-map work is a consumer of this semantic work, or a direct member of it needing to be fully designed in tandem.

[ ] A couple of potential designs around using the major classes of objects: point, area, mask (e.g. dock or waypoint or elevator; zoom or section; speed or keepout) to motivate the design of tools or servers required for this development.

[ ] Once we have some pretty good understandings of how we want the data formatted and how we want them to be used, we can then discuss the GUI element of this. Though I think we're all in agreement here.

@SteveMacenski could you move this to the first post, for better visibility.

We could start the discussion on the next agenda item too, while we wait for Morgan's feedback.

SteveMacenski commented 4 years ago

Done. Do you agree with that for masks? I was also looking for "OK I agree" before moving on. After we have that design doc with your section in it, I'll add the mask section and I'll throw out my idea for the format / tags for us to discuss when we get to the nitty-gritty. Which seems like after we do the first 3 bullets there, bullet 4 is to come up with those nitty-grittys by means of tangible, literal examples of support of important examples. We can then use those to derive the standard. (or vise versa, its a chicken and egg problem. I like to start with a large amount of examples building on each other to be exhaustive so you can quickly iterate and have examples, then make the general rules on them)

I also just added another bullet for tooling to discuss after we have a format, how we want to use it. Who reads it, what services do we need, what tools are required to handle that data and extract the useful bits, etc.

I think @codebot will have some thoughts on the multiple floors topic given his work on elevator traffic flow. I'd like him to start that discussion with a proposal from his experience and what we've discussed above using XML.

codebot commented 4 years ago

Greetings.

Masks: it's maybe trivial, but I'd suggest calling them "layers," since "mask" (to me at least) implies cropping something for some operation. That's relevant for keep-out zones, I guess, but to use the capability in a generic way (absolute speed limits, keep-out zones, exclusive zones near particular robot-vendor chargers, hold zones where it's OK to park temporarily waiting for traffic, parking zones where it's OK to park indefinitely and/or during emergencies, various other nav parameters, etc.) it seems "layer" might be a nice generic term for this type of thing, borrowing from the PCB CAD world (and lots of other 2d editing domains). We are using one type of "layers" concept currently in the traffic-editor prototype to overlay robot-generated maps on top of architectural floor plans for sanity-checking and floorplan verification, but it's a great idea to elevate some sort of layers/masks to a generic primitive, with generic color mapping, adjustable transparency, adjustable stacking order, etc. :+1:

Elevators: Happy to discuss! Shall we do that in a separate ticket, on Slack, or here? With lots of topics in flight simultaneously on this thread, I'm getting a bit lost.

SteveMacenski commented 4 years ago

I'm OK with layers. We've called it a number of things over time (zones, costmap filters, masks). That's just nomenclature though - any objects to the meat of the proposal? If no one else objects to layers, sounds like a plan.

Sure, want to start the proposal discussion for multi-level maps in the slack channel? I agree we're covering alot of bases here, that's why I've tried to keep it organized with just 1 topic at a time. Still, this thread has gotten long. If we discuss on slack, we'll need to come back here and summarize the outcome so its documented for future developers since Slack isn't a real record.

codebot commented 4 years ago

We could have separate threads on separate issue tickets for each "sub-topic" to try to keep it focused. Or we could have noisy chatter on Slack and come back to summarize here, either way is fine with me. I'm lurking on the navigation2 slack now. Thanks for setting that up!

shrijitsingh99 commented 4 years ago

Do you agree with that for masks?

Yup. should have conveyed it using a reply 😅

Masks: it's maybe trivial, but I'd suggest calling them "layers," since "mask" (to me at least) implies cropping something for some operation.

I had some ideas about layers that @Sarath18 and I were discussing. I agree with you masks are a subset of layers i.e. a mask is a sperate layer.

That's just nomenclature though - any objects to the meat of the proposal?

I think it's more than nomenclature. It could possibly be used to divide semantic information in a layer like a system not only for easier management of large amounts semantic information to adding more data like @codebot mentioned having a layer to represent the architectural floor plans. Maybe a layer of intensity values from a 3D LiDAR.

If it is something we are discussing now, I can draft a more well define proposal/message of what I had in mind for layers, else we can defer this for later. I was planning on proposing this at a later stage in the discussion since a layer is a subset of a mask according to me, and converting it to a layer is trivial.

We could have separate threads on separate issue tickets for each "sub-topic" to try to keep it focused. Or we could have noisy chatter on Slack and come back to summarize here

I think tickets are the way to go. Albeit progress through tickets is slower than chatting on Slack, but it makes you think twice and do detailed and well-thought writeups of what you have in mind. On Slack, it gets diluted to noisy messages as you mentioned, from which it might be harder to actually draw conclusions.

For archival purposes, tickets are also better. You guys take the call on this.