ros2 / rosbag2

Apache License 2.0
285 stars 252 forks source link

Store message definition in bag #782

Closed jhurliman closed 1 year ago

jhurliman commented 3 years ago

Description

Neither the metadata.yaml file or .db3 file(s) contain the ROS2 message definitions used to generate the bag. This makes it impossible to deserialize rosbag2 files without having the correct version of all the original .msg files on the machine that is decoding the bag. Tools like Foxglove Studio are not able to decode rosbag2 files, and if message definitions are ever modified you have to store a full snapshot of message definitions alongside each bag file. If the wrong message definition is used during deserialization, there doesn't appear to be any mechanism to catch this leading to either decoding failures or potentially invalid decoding.

This is a regression from the rosbag1 format, where a .bag file is self describing and can be decoded even if the collection of .msg files used to generate it have been lost, or if a .bag file is shared with someone else.

emersonknapp commented 3 years ago

Thanks for filing this ticket - it's something that has come up as part of other discussions but was not explicitly tracked as its own feature request. I've re-worded the title to name the positive feature to build (rather than the negative lack of feature)

danthony06 commented 3 years ago

@jhurliman I have not personally used this package, and there's definitely some major assumptions made in the codebase, but I think this is doing a lot of what you're envisioning: https://gitlab.com/ternaris/rosbags

danthony06 commented 3 years ago

Ah, looking at how they do that, it looks like they're manually embedding a bunch of information about the message format, so it can't handle arbitrary messages. I do like the idea of having all of the information to fully extra the messages from a bag embedded in the actual bag file.

chaoflow commented 3 years ago

@danthony06 Writing as one of the authors: rosbags so far does not store message definitions. It supports reading rosbag2 with custom message types by registering the idl/msg files, see https://ternaris.gitlab.io/rosbags/topics/typesys.html

There are two cases:

  1. conversion from rosbag1 to rosbag2; message definitions from rosbag1 are used to convert and we want to store these along with topic latching info for later deserialization and to convert back to rosbag1.
  2. recorded rosbag2; either the recorder or a post-processing tool needs to add the message definitions, depending on when these are available.

@emersonknapp et al: What are your thoughts on where and how to store message definitions and topic latching info?

pkess commented 3 years ago

Hi all, i would like to get this feature as well. But as a first step it might be very usefull if the rosbag would store at least some kind of a hash of the message definition for each topic. With this you could check if you have the correct message definition available. Ich think rosbag1 stored both the full message definiton and some kind of hash for the message type.

danthony06 commented 3 years ago

@chaoflow thank you for the correction, sorry I missed the details on how you're getting the format.

@pkess You're right that rosbag1 is storing the MD5 sum, but it's done on a connection basis, not message type basis, according to the documentation: http://wiki.ros.org/Bags/Format/2.0. I'm guessing this is to catch a corner case in ROS 1 where it might be possible to have nodes using the same message type, but are different versions with different internal fields. For example, Node A and B are compiled with one message definition and use that to communicate, while Nodes C and D have a message with the same name, but has been changed and built at a later time, and they can use that to communicate. So the bag file would have multiple entries for the same message name, but the implementation is different, so it has to track the MD5 of the connection.

emersonknapp commented 3 years ago

What are your thoughts on where and how to store message definitions and topic latching info?

I see the obvious place as being in the TopicMetadata - this thing already exists and we should be able to safely expand it with some serialized representation of the message definition.

For latching info - this is handled using the Durability QoS using TRANSIENT_LOCAL durability with a history. This behavior provides a superset of the ROS 1 "latching" functionality. http://docs.ros.org/en/galactic/Concepts/About-Quality-of-Service-Settings.html#qos-policies for some info

durko commented 3 years ago

@emersonknapp Saving message definitions per topic could lead to some content duplication for topics sharing message types. Nested types (e.g. std_msgs/msg/Header) are often not part of the definition itself and need to be also stored somewhere. While yaml supports anchors and references, it would be easier to work with a flat storage for message definition metadata, e.g. with a type_definitions array as a sibling to topics_with_message_count.

Suggested content:

rosbag2_bagfile_information:
  type_definitions:
  - format: 'idl'
    definition: '{content of Bool.idl file}'
  - format: 'msg'
    definition: 'MSG: std_msgs/Int32\n{content of Int32.msg file or message_definition from ROS1 bag}'

The idl case is straightforward, as only the message definition content is necessary. The msg case requires one additional piece of information, namely the message type name. In the example above the message type name is prepended to the message definition, similar to concatenated definitions in message_definition fields of ROS1 bag connection headers. Alternatively, message type names could also be saved as a separate key. As both formats can define multiple types in a single message definition, a type name key would probably have to be a list of strings.

jhurliman commented 3 years ago

My understanding is that DDS (and .idl files as an extension of that) is one possible transport system in ROS2 which has a pluggable transport layer. If that’s correct, making idl storage a first class concept in rosbag2 would be a leaky abstraction.

ROS1 bags handle this issue gracefully. The one thing they add beyond the raw msgdef is a checksum, so you don’t have to do a byte for byte long string comparison to see if a message definition has changed between two bags. ROS1 bags also store msgdefs per connection instead of per topic, which would fix the replay bug when two publishers publish to the same topic with different QoS profiles.

chaoflow commented 3 years ago

My understanding is that DDS (and .idl files as an extension of that) is one possible transport system in ROS2 which has a pluggable transport layer. If that’s correct, making idl storage a first class concept in rosbag2 would be a leaky abstraction.

@jhurliman The proposal above is not bound to idl but already contains msg as other format.

ROS1 bags handle this issue gracefully. The one thing they add beyond the raw msgdef is a checksum, so you don’t have to do a byte for byte long string comparison to see if a message definition has changed between two bags.

When considering a checksum for message definitions for rosbag2 we should take normalization into account instead of simply hashing the string definition.

ROS1 bags also store msgdefs per connection instead of per topic, which would fix the replay bug when two publishers publish to the same topic with different QoS profiles.

QoS profile is a concept orthogonal to messages definitions, same as latching in rosbag1, i.e. a topic has a message type with a specific message definition and a topic can have a QoS profile.

ROS1 stores message definitions per connection and there can be multiple connections per topic with theoretically, regarding the storage format, different message types and definitions. I have not seen mixed-type-topics in practice, am not sure whether it is possible to generate them through recording, in contrast to manually writing bags, and don't see a use case for them so far.

danthony06 commented 3 years ago

I agree that the mixed-type-topics is an unusual case, and unlikely to happen in practice. About the only time I can think of when I have encountered this is when there are multiple robots that are running different versions of software. It's generally not a great idea and usually indicates something wrong with the larger system, but it's certainly possible.

jtbandes commented 3 years ago

Lack of message definitions also affects the JS/WebSocket bridges rosbridge_suite and ros2-web-bridge. I just made a PR for the former that re-constitutes a plausible .msg definition from the generated Python classes (although the definition it generates is missing constants and default values): https://github.com/RobotWebTools/rosbridge_suite/pull/574

I hope this can be a temporary workaround, since it would be nicer to address this by including raw datatypes at the rosidl code generation level, so they can be used by both rosbag2 and dynamic tools like the rosbridge_server.

chaoflow commented 3 years ago

In #819 we're discussing for db3 files to be self-contained, i.e. the message definitions need to be stored in each db3 file (in case of splitting) and are duplicated into metadata.yaml the same as happens already for TopicMetadata.

@emersonknapp you suggested to add message definitions to TopicMetadata, which would lead to duplication but we have the same pattern already for QoS profiles. Is it ok to duplicate message definitions as well? Do you see a way of storing nested types there as well?

Apart from where to store, for me it is not clear yet from where to get:

emersonknapp commented 3 years ago

Most of these questions don't have answers yet, because there isn't a core mechanism for retrieving message definitions.

I've created https://github.com/ros2/ros2/issues/1159 to contain the discussion for this core feature. Hopefully, once we have that feature available, the rosbag2-side implementation will be relatively straightforward.

Karsten1987 commented 3 years ago

Is it ok to duplicate message definitions as well?

Maybe I am thinking too naively here, but I can't imagine that the storage size of the message definition is in any case comparable to the amount of data (the actual incoming messages) we store in the actual rosbags. I don't really see a way of not duplicating the message definitions across the bag files, because as said, the bag files should presumably be self contained and independent. That way we can leverage tools like merging various bag files etc.

emersonknapp commented 3 years ago

Note: discussion ros2 core functionality in ros2/ros2#1159

ros-discourse commented 3 years ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/announcing-live-ros-2-connections-in-foxglove-studio/22278/1

wjwwood commented 2 years ago

Just reading this now, as part of some other work I'm doing, and I just wanted to point out that currently rosbag (AFAIK) requires the typesupport be installed locally for a message when recording or playing back, so the message definitions could be stored in those packages and accessed by rosbag locally, therefore not strictly requiring https://github.com/ros2/ros2/issues/1159 in order to address this feature. Obviously we'd like to have https://github.com/ros2/ros2/issues/1159 so that we can do away with this restriction that message packages need to be available locally for rosbag to record, but that's technically a separate issue.

Sorry if someone already pointed this out, because I was only able to skim the previous comments.

emersonknapp commented 2 years ago

so the message definitions could be stored in those packages and accessed by rosbag locally

This is the approach that @jhurliman is trying to take now, but isn't sure how to implement - see https://github.com/ros2/ros2/issues/1159#issuecomment-1064674121

jhurliman commented 2 years ago

This is implemented now. See https://github.com/ros-tooling/rosbag2_storage_mcap/blob/main/rosbag2_storage_mcap/src/message_definition_cache.cpp

emersonknapp commented 2 years ago

Oh I see - nice! I might think best possible scenario is that this functionality is implemented in the ros2 core, and the contents passed to the storage implementation to store. Ideally the storage implementation doesn't have any dependency on the ROS2 system, and also the behavior could be shared between implementations. I do get that this was the fastest way to get it working in the mcap storage, though :).

ros-discourse commented 2 years ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/make-your-ros-2-files-self-contained-smaller-and-more-performant/25590/1

ros-discourse commented 2 years ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-bag-data-in-plotjuggler/27072/1

amacneil commented 2 years ago

We took the ideas from this thread, https://github.com/ros2/rosbag2/issues/819, https://github.com/ros2/rosbag2/issues/855, and several others and incorporated them into the design of MCAP, a highly efficient, self-contained recording format for ROS 2 and more.

A ROS 2 MCAP storage plugin is available for Foxy, Galactic, Humble and Rolling, and using it is as simple as ros2 bag record -s mcap.

Since this feature is now already supported by the MCAP storage plugin, I think we should either close it, or change the title to specifically request adding message definitions to files created by the sqlite plugin.

jhurliman commented 2 years ago

I’ll close the issue since MCAP is stable and a preferred alternative now.

chaoflow commented 2 years ago

@jhurliman I see how mcap solves #819, however I agree with @emersonknapp https://github.com/ros2/rosbag2/issues/782#issuecomment-1081232184 that storage of message definitions should not be specific to one format and especially should be realized for the default format. To that end I suggest to reopen this issue.

MichaelOrlov commented 1 year ago

@chaoflow FYI. Now we store message definitions in SQLite3 db3 files as well. I added new table message_definitions and incremented schema_version to 4 in schema table. Please refer to the #1293 for details. Looking forward for relevant updates in rosbags.

chaoflow commented 1 year ago

@chaoflow FYI. Now we store message definitions in SQLite3 db3 files as well. I added new table message_definitions and incremented schema_version to 4 in schema table. Please refer to the #1293 for details. Looking forward for relevant updates in rosbags.

@MichaelOrlov Thank you very much! Is there a suggested migration path for previously recorded rosbag2?

MichaelOrlov commented 1 year ago

@chaoflow The migration path is simple. If db schema version < 4 there are no message_definition table.
Please see how we are determining db schema version here https://github.com/ros2/rosbag2/blob/18216763ac13d942ee582c4b658962e49179ba3d/rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp#L797-L814 Or you can directly check

if (database_->table_exists("message_definition")) {
  // TBD
}
chaoflow commented 1 year ago

@MichaelOrlov Thanks and sorry for being unclear. I meant whether there is a migration path to get message definitions into bags that were previously recorded ? Something like: ros2 bag migrate oldbag newbag assuming the correct message definitions for the previously recorded bag are available.

MichaelOrlov commented 1 year ago

@chaoflow Oh, in t his case ros2 bag convert could be used https://github.com/ros2/rosbag2#converting-bags This example should work like "ros2 bag migrate oldbag newbag"

$ ros2 bag convert -i bag1 -o out.yaml

# out.yaml
output_bags:
- uri: /output/bag1  # required
  storage_id: ""  # will use the default storage plugin, if unspecified
  all: true
chaoflow commented 1 year ago

@MichaelOrlov Thank you very much!