Open rrrand opened 4 years ago
Yeah, there are some known issues about recording a lot of data with rosbag2. I'm going to move this ticket over to the rosbag2 repository, where I believe some work is being done to address this.
The very same error happens when running a C++ publisher with a very high frequency (e.g. in the performance tests). So I think this should stay in the rmw_fastrtps
repo since the problem doesn't exist with other RMW implementations and has nothing to do with rosbag.
The very same error happens when running a C++ publisher with a very high frequency (e.g. in the performance tests). So I think this should stay in the
rmw_fastrtps
repo since the problem doesn't exist with other RMW implementations and has nothing to do with rosbag.
Well, I actually think there are 2 issues here.
One issue is that recording a bag is a synchronous process; it collects a bunch of data, then blocks while writing it to disk. In the meantime, data that is being published is probably filling up the internal Fast-DDS queues, since the receiver isn't receiving any of them.
The second issue is what you describe, in that we rmw_fastrtps probably shouldn't crash when that situation happens.
The reported error message described in this ticket is:
[component_container-2] terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
[component_container-2] what(): failed to publish message: cannot publish data, at /tmp/binarydeb/ros-foxy-rmw-fastrtps-shared-cpp-1.2.0/src/rmw_publish.cpp:53, at /tmp/binarydeb/ros-foxy-rcl-1.1.7/src/rcl/publisher.c:291
Which is clearly happening on the publisher side in rmw_fastrtps
and is not related to rosbag.
Anyway please either move this ticket back or create a new ticket in rmw_fastrtps
.
could be related to https://github.com/ros2/rmw_fastrtps/issues/338.
We think this should be transferred back to rmw_fastrtps based on the discussion here
@richiprosima Do you have any insight why rmw_publish()
fails to publish the data?
@richiprosima @MiguelCompany Any thoughts on what might be going on here?
This is almost always a timeout when using keep_all QoS. We could change the returned error to RMW_RET_TIMEOUT
, but the rcl would also throw an exception in that case.
Another option is to change the default value of max_blocking_time
to infinity, as other rmw implementations are doing, but that would mean the call to publish would block forever until there is room for the sample to be added to the history.
Yet another option is to set the default value of max_samples
to infinity, so there will always be room for new samples, but memory consumption would grow indefinitely until there is not enough memory, in which case we would get to the same situation (publish will return false, and RMW_RET_ERROR would be returned, which will throw the unhandled exception)
Bug report
Required Info:
Steps to reproduce issue
My purpose was to test bag recording reliability with components exchange. I modified components from https://github.com/DensoADAS/high_freq_pub_example/tree/master: reliable keep_all quality of service was set and new launch file to start components and bag was created.
Launch file:
chatter_qos.yaml file for bag:
Modifed listener.cpp:
Modified talker.cpp:
Expected behavior
No exceptions.
Actual behavior
Exception was thrown:
Additional information
I would like to share my statistics. I measured reliable keep_all exchange time of talker and listener to transmit 2_000_000 messages with int32. All messages were received by listener and were saved by bag. I have got the following durations:
So recording messages in bag is a long running operation.