ros2 / rosbag2

Apache License 2.0
272 stars 244 forks source link

inform storage plugin about message-level compression #1144

Closed james-rms closed 1 year ago

james-rms commented 1 year ago

Description

Right now the message-level compression mode being used is not passed to the storage plugin. This means that without the metadata.yaml file, we have no way of knowing if a db3 or mcap's messages are CDR encoded, or CDR encoded and compressed. This information should be stored inside the bag somehow. Specific plugins can choose their own way of storing this and dealing with that information, but for now, this is an issue tracking the storage plugin API's ability to pass that information down at all.

Completion Criteria

Implementation Notes / Suggestions

Testing Notes / Suggestions

All features in this project need tests. Please give some input on cases that will need to be tested - and how the testing might be implemented.

MichaelOrlov commented 1 year ago

As far as I recall we have discussed per message compression and per message encryption during original mcap design discussions. And as far as I remember we agreed to have string field which will describe serialization format, encryption and compression and we agreed that each layer will be spitted by some predefined delimiter and as far as I remember suggestions was to use colon : or double colon :: as delimiter. For instance if we have CDR serialized message which was encrypted with AES128 and compressed with zstd. The appropriate message format or encoding field would have a value CDR::AES128::ZSTD . And the most appropriate field for that is TopicMetada::serialization_format https://github.com/ros2/rosbag2/blob/3df2fe30ece1b3490da6f8af8b7c24ed11c7bba9/rosbag2_storage/include/rosbag2_storage/topic_metadata.hpp#L27. Adding new special field for indicating compression and encryption will cause a big headache for backward compatibility and doesn't look reasonable IMO.

MichaelOrlov commented 1 year ago

For completeness we have already passing TopicMetada data structure as parameter for the BaseWriteInterface::create_topic(const TopicMetadata & topic) API.

james-rms commented 1 year ago

I think compression type and message serialisation format are separate concepts that any consumer will need to handle separately. How they are encoded in the TopicMetadata is an implementation detail.

Therefore, the choice comes down to one of:

  1. putting serialization format and compression type together in the same field with a :: delimiter
  2. adding a new field to TopicMetadata for the compression type.

I'm in favor of (2), assuming it's technically feasible to do so without breaking ABI compatibility.

The only reason not to choose (2) would be to avoid the pain of dealing with struct compatibility concerns across ABI boundaries. Before choosing I'd like to understand concretely what these concerns are and what we'd have to do to address them, if we chose to go down the (2) route.

If we chose (1), I would want to document a clear definition of the scheme of the serialization_format field that is compatible with existing consumers (since compatibility is the whole point), including a plan for future extensibility with encryption information. This should include answers for cases where a message is encrypted but not compressed and vice-versa.

amacneil commented 1 year ago

I would like to make a case that compression options are entirely a storage plugin concern, not a global setting.

Different storage plugins will support different compression options. For example:

I think file and message level compression are both strictly worse than chunk compression, and having message encryption as a global setting means that you can currently do silly things like enable both message compression (at the rosbag2 level) and chunk compression (at the mcap storage plugin level).

In addition, storage plugins cannot simply treat the messages as binary blobs. MCAP is specifically designed to be self-contained and allow decoding by non-ROS third party tools, so it is not acceptable for the upstream rosbag package to transform the data without the plugin being informed.

Concrete proposal: move the current compression options (per file, per message) into the sqlite plugin, and let each plugin handle compression individually.

MichaelOrlov commented 1 year ago

@amacneil Your statement about that chunk compression is always better would be true if not considering real world data and in particular camera images and LIDAR data.

Please consider a case when for instance we have stream of data from multiple cameras and we want to do per message compression with JPEG and do it HW accelerated on FPGA. And we want to be a valid case if we are not able to compress in realtime all messages we will write some of them uncompressed.

This use case doesn't align very well in your paradigm that all compression should be moved to the plugin level.

amacneil commented 1 year ago

JPEG compression is not a responsibility of the recording process. That would require modifying the content of the messages, making them incompatible with their own schema.

The logic you describe (compressing a subset of messages with JPEG) belongs at the application layer using the existing ROS concepts (nodes that subscribe and transform messages).

james-rms commented 1 year ago

Copying discussion to issue - below is from @MichaelOrlov .

I've dig deeper in rosbag2 "derbies" and came to the conclusion that we can't provide BagMetadata in the open() calls, alongside rosbag2_storage::StorageOptions& as you would like it. Not even part of it related to the compression. The reasoning for that is the multiple abstraction layers and the fact that BagMetadata object initializes after calling open(..) method of the storage plugin. And uses some data from storage object for initialization. For details: - here is the place https://github.com/ros2/rosbag2/blob/fcafa279d31c7b61a244503f597b8d0e36dc94a5/rosbag2_storage/src/rosbag2_storage/impl/storage_factory_impl.hpp#L130 where from we are calling storage open() method. And here is the place where we are initializing BagMetadata after creating and opening the storage https://github.com/ros2/rosbag2/blob/fcafa279d31c7b61a244503f597b8d0e36dc94a5/rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp#L110-L144 init_metadata() defined here https://github.com/ros2/rosbag2/blob/rolling/rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp#L65-L79 The things even worsen since init_metadata() overloaded in SequentialCompressionWriter https://github.com/ros2/rosbag2/blob/rolling/rosbag2_compression/src/rosbag2_compression/sequential_compression_writer.cpp#L111-L118and calling from base SequentialWriter class. I am considering to create separate method update_metadata(BagMetadata &) for storage writer interface to call it after init_metadata() .

james-rms commented 1 year ago

Let me also register the standing goals agreed on at the working group here on this issue:

james-rms commented 1 year ago

The reasoning for that is the multiple abstraction layers and the fact that BagMetadata object initializes after calling open(..) method of the storage plugin. And uses some data from storage object for initialization.

The proposal of dealing with this by adding an update_metadata method seems hacky to me.

I think we should change tactic and not try to pass all of BagMetadata to the storage plugin.

At a minimum the storage plugin needs only these data:

My alternate proposal is:

  1. come up with a new struct RecorderMetadata which is defined as "metadata about the recorder's state that would be required by a bag reader when decoding the bag contents". This would include:
    • Compression mode
    • Compression format Any future pluggable component of rosbag2 (such as encryption, for example) would likely need to add something to this structure.
  2. Rev the version of BagMetadata to 7, and replace the compression mode and format fields with a RecorderMetadata instance.
  3. Add an open() overload with a recorder_metadata argument to SequentialCompressionWriter, SequentialWriter, and to the storage plugin API.
MichaelOrlov commented 1 year ago

@james-rms I've tried to spend some more time trying to consider your proposal with transmitting compression_mode in some form as you mentioned to be as part of newly created RecorderMetadata or even better naming as RecorderInfo or WriterOptions data structure to the storage::open(..) method. I am sorry, but I can't come to the any descent solution with current rosbag2 design.

The problem is that SequentialCompressionWriter inherited from SequentialWriter https://github.com/ros2/rosbag2/blob/0c7c352222a08470211232453dab9749ab405201/rosbag2_compression/src/rosbag2_compression/sequential_compression_writer.cpp#L40-L45 and there are no descent way to pass such parameters aka WriterOptions with rosbag2_compression::CompressionOptions to the base SequentialWriter class to be able to pass this parameter further on to the https://github.com/ros2/rosbag2/blob/fcafa279d31c7b61a244503f597b8d0e36dc94a5/rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp#L110 To be more clear. Of course we can pass some WriterOptions with rosbag2_compression::CompressionOptions to the SequentialWriter constructor but it will be incorrect from design point of view if we will pass it from SequentialCompressionWriter. Because SequentialWriter shall not be created with WriterOptions::CompressionOptions::compression_mode == message.

If you can come up with less "hucky" solution with concrete implementation in code which align with current rosbag2 design - I am open to consider it. Unfortunately, currently I don't see a better design solution rather then use newly created update_metadata API from #1149. Btw we will call update_metadata right after storage open(..) and guarantee that it will be before any write(message) calls.

james-rms commented 1 year ago

Actions from sync:

ros-discourse commented 1 year ago

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/psa-default-ros-2-bag-storage-format-is-changing-to-mcap-in-iron/28489/1