We occasionally have an issue where a rosbag2 Recorder node segfaults and dies when stop() is called. This ends up truncating the .mcap file being written, resulting in it being corrupted because it lacks trailing magic bytes.
Expected Behavior
No segfault, the bag is closed and the .mcap has its trailing bytes written.
Actual Behavior
The process dies with code -11. Logs:
[INFO] [1730717198.622841633] [main_bag_rosbag_recorder]: Listening for topics... [rosbag_run_recorder_node-9]
[INFO] [1730717198.622848700] [main_bag_rosbag_recorder]: Event publisher thread: Starting [rosbag_run_recorder_node-9]
[INFO] [1730717198.823090723] [main_bag_rosbag_recorder]: Subscribed to topic '/visualization/machine_interface/target_terrain' [rosbag_run_recorder_node-9]
....
[INFO] [1730717199.253250365] [main_bag_rosbag_recorder]: Overriding subscription profile for /rosout [rosbag_run_recorder_node-9]
[INFO] [1730717199.258032766] [main_bag_rosbag_recorder]: Subscribed to topic '/rosout' [rosbag_run_recorder_node-9]
[INFO] [1730717199.268296194] [main_bag_rosbag_recorder]: Recording... [rosbag_run_recorder_node-9]
...
Error: [1730717288.011213136] [main_bag_rosbag_recorder]: discovery_future_.wait_for(100) return status: timeout [rosbag_run_recorder_node-9]
Error: [rosbag_run_recorder_node-9]: process has died [pid 111, exit code -11, cmd '/home/dev_ws/install/rosbag_run_recorder/lib/rosbag_run_recorder/rosbag_run_recorder_node --ros-args --log-level INFO --ros-args'].
Logging from other nodes and many "subscribed to topic" messages have been excluded.
To Reproduce
We have not been able to consistently reproduce, and this error occurs only once out of every 50 or so calls to stop(). Contributing factors seem to be heavy CPU load and larger bags, and it does seem to be somewhat device-dependent. Unfortunately I cannot share the full bag, but its mcap summary is
rosbag2_storage::StorageOptions storage_options = {
bagPath.string(), "mcap", 0, 0, 0,
"zstd_fast",
"", false};
rosbag2_transport::RecordOptions record_options;
record_options.topics = /* our own list of ~100 topics */;
record_options.topic_qos_profile_overrides = {{"/rosout", rclcpp::RosoutQoS()}};
record_options.rmw_serialization_format = std::string(rmw_get_serialization_format());
auto writer = rosbag2_transport::ReaderWriterFactory::make_writer(record_options);
return std::make_shared<rosbag2_transport::Recorder>(std::move(writer), std::make_shared<KeyboardHandler>(false), storage_options, record_options,"main_bag_rosbag_recorder");
and execute it with a MultiThreadedExecutor constructed using the default arguments and 4 threads (the executor is also running 2 other Recorder nodes but they record far less data and we haven't had any issues with them).
System (please complete the following information)
This problem had also occurred when running via ros2 bag record, but there it manifested as the recorder apparently shutting down in response to SIGINT, but for some reason not closing the bag properly.
Description
We occasionally have an issue where a rosbag2
Recorder
node segfaults and dies whenstop()
is called. This ends up truncating the.mcap
file being written, resulting in it being corrupted because it lacks trailing magic bytes.Expected Behavior
No segfault, the bag is closed and the
.mcap
has its trailing bytes written.Actual Behavior
The process dies with code
-11
. Logs:Logging from other nodes and many "subscribed to topic" messages have been excluded.
To Reproduce
We have not been able to consistently reproduce, and this error occurs only once out of every 50 or so calls to
stop()
. Contributing factors seem to be heavy CPU load and larger bags, and it does seem to be somewhat device-dependent. Unfortunately I cannot share the full bag, but itsmcap
summary isand it was recording on 59 topics.
We construct the node roughly as follows:
and execute it with a
MultiThreadedExecutor
constructed using the default arguments and 4 threads (the executor is also running 2 otherRecorder
nodes but they record far less data and we haven't had any issues with them).System (please complete the following information)
0.15.12
(as part of Humble release 2024-07-29)Additional context
This problem had also occurred when running via
ros2 bag record
, but there it manifested as the recorder apparently shutting down in response to SIGINT, but for some reason not closing the bag properly.