ros2 / rosbag2

Apache License 2.0
278 stars 248 forks source link

Support `--repeat-latched` in ROS 2 (transient local) #1159

Open amacneil opened 1 year ago

amacneil commented 1 year ago

Description

ROS 1 supported a --repeat-latched feature (https://github.com/ros/ros_comm/pull/1850) to repeat latched topics at the start of each bag split.

As far as I can tell, rosbag2 does not have this feature. Is there any reason we should not add it?

MichaelOrlov commented 1 year ago

@amacneil Yes we have the reason to not do this for the sake to have deterministic replay. This is duplicate of the https://github.com/ros2/rosbag2/pull/1051 please see my detailed explanation in https://github.com/ros2/rosbag2/pull/1051#pullrequestreview-1059614100

amacneil commented 1 year ago

I'll admit I am not super familiar with how tf_static and transient local works in ROS 2 (I am more familiar with latched topics in ROS 1). https://github.com/ros2/ros2/issues/464 seems to have some details.

We have users who would like to open one file from a split bag and have the transforms work correctly. Your comment in that issue seems to be based on the fact that you don't have a way in rosbag2 to "play" a split bag file, but we do support opening a single file in Foxglove. I think it would be good to get more feedback from users on whether they need this feature.

MichaelOrlov commented 1 year ago

If users want something it doesn't means that we should jump in and implement what they want without considering other cases and drawbacks.

My main point not in the case that rosbag2 doesn't support playing a split bag rather that we need to have determinism in replay. Basically we shall not inject or copy messages which we didn't receive on the wires in any cases.

We have a lot another cases when transient local durability uses for instance camera images or LIDAR streams - and what? We should copy all messages from camera topic from previous file to the next file when doing split?

If someone want to repeat tf_static messages published once at the beginning before playing back bag after split, one can use rosbag2 play with duration option and topic filtering for playing back just messages from tf_static then right after playback normally bag which was recorded after split. Another option is to use ros2 bag convert to create new bag where will be messages from tf_static from first bag and the rest will be content from bag after split.

jhurliman commented 1 year ago

We have a lot another cases when transient local durability uses for instance camera images or LIDAR streams - and what? We should copy all messages from camera topic from previous file to the next file when doing split?

My understanding of transient local durability is there is a user-configured history size, so it would only copy all messages from the previous file for a topic if the previous file was very short or the history size was unusually large. The default history size is 10, and 5 for the sensor QoS profile.

MichaelOrlov commented 1 year ago

@jhurliman Though the way how it's organized in DDS and ROS2 is that publisher has cache with N number of messages when transient local durability is used and when new subscriber joining to publisher - subscriber will be getting the last N messages from publisher cache rather than just one latest message.

The case when transient local durability uses for tf_static topic and there is a publisher only publishes a few messages at the beginning is the one particular case.

It could be another cases when transient local uses for camera images topics or LIDAR and there are continuous rgb data stream. Newly created subscriber will get N last images from cache and then all other following sequences from ongoing running stream since publisher will continuously publish new messages.

@jhurliman @amacneil After some consideration I tend to agree that we can implement this feature but not as default behavior. Only as an option with parameter given via CLI.

Some technical details and consideration how to implement this feature.

Thoughts?

ga58lar commented 1 year ago

@amacneil @MichaelOrlov Hi everyone, are there any updates on this feature?
This feature is exactly what my colleagues and I are currently looking for to record tf_static messages on each split and thus make it possible to play a single bag of a split with all information.

MichaelOrlov commented 1 year ago

@ga58lar Unlikely we will implement this feature since it contradicts the entire DDS communication approach and unavoidably will introduce a significant impact on the performance of the split operation and will cause messages drops during recording.

I would suggest a workaround. If you need to playback only messages from the tf_static topic at the beginning of some bag file or at some random time, you can use ros2 bag convert command https://github.com/ros2/rosbag2#converting-bags to extract messages from tf_static topic to a separate bag file and use separate ros2 bag play tf_static player instance to playback those messages when you would need them. You can use launcher yaml file to define the order of the playback calls.

emersonknapp commented 1 year ago

The way I understand this, there are these different cases being considered here:

  1. During playback (ros2 bag play) from the beginning - we want latched topics to persist between bag splits so that late joiners receive the messages, even post split.
  2. When starting play from a later timestamp that happens to be in a later split file - we still want those publishers to have the messages they theoretically would have
  3. When analyzing bags as data objects (and perhaps separating splits from each other), we want to see messages in the bag representing the current latched/transient_local message buffers

I believe that Case 1 actually already works correctly. When you start playback from the beginning, publishers are created with the appropriate QoS -- so if the original topic in the recorded system had all durability=TRANSIENT_LOCAL then the rosbag2 playback publisher will do the same. Then it will have the same latched behavior to new joiners, even across splits because playback publishers are not shut down/recreated across splits.

Case 2 on the other hand would still exhibit the same behavior but only for messages originally published after the timestamp of the start of playback. To give the publishers the internal information, we could do a check for latched publishers and do a "burst publish" of their topic from the beginning of the bag up to the starting point, which would act that way.

Case 3 would prove more difficult, though I don't see it as an engineering impossibility. We might create a separate entry type, a "internal latch-buffer" topic at the beginning of new splits, that is not played back during normal operation, but used only used for the "burst publish latch priming" if starting playback from that file.

Unfortunately the History Depth (latching count) of the publisher is not always broadcast such that rosbag2 would know what it is. DDS considers History Depth (as well as Lifespan but that's less relevant here) to be an internal QoS configuration, other participants don't need to know about it, so it's not guaranteed in every implementation to be conveyed in discovery.

Like @MichaelOrlov did bring up, though, the potential extra features I mentioned are not without performance cost

tonynajjar commented 6 months ago

Hey, I just encountered this issue when trying to visualize split bag files in Foxglove. @amacneil or @MichaelOrlov I understand that this is WIP but did you find a workaround I could use at the moment, either on the foxglove or rosbag2 side? The only option for me is to publish /tf_static more frequently which beats its purpose, or transfer my static TFs to /tf and publish at high rate which is a waste of resources. Thanks

Maybe @ga58lar you have an idea as well?

P.S: @MichaelOrlov your proposed workaround would not work in my use case unfortunately

ga58lar commented 6 months ago

@tonynajjar Unfortunately, I have not found a really good solution that does not defeat the criticism raised. I am now publishing the static transforms on every bag-split event of the recording. I think the ease of use for splitted bags is worth the overhead. In my case, all static transforms are located within one URDF file and I am using the robot_state_publisher. Therefore, I just added one subscription to events/write_split and execute publishFixedTransforms(); in the callback.

https://github.com/ros/robot_state_publisher/blob/ros2/src/robot_state_publisher.cpp#L151

  // subscribe to bag split events
  bag_split_sub_ = this->create_subscription<rosbag2_interfaces::msg::WriteSplitEvent>(
    "events/write_split",
    rclcpp::SensorDataQoS(),
    std::bind(&RobotStatePublisher::callbackBagSplit, this, std::placeholders::_1),
    subscriber_options);

https://github.com/ros/robot_state_publisher/blob/ros2/src/robot_state_publisher.cpp#L364

void RobotStatePublisher::callbackBagSplit([[maybe_unused]] rosbag2_interfaces::msg::WriteSplitEvent::SharedPtr message)
{
  // re-publish tf_static from urdf file on every bag split
  publishFixedTransforms();
  return;
}

It would be better though to subscribe to /tf_static too and re-publish the topic instead of loading the static transforms on each split

tonynajjar commented 6 months ago

Thanks for sharing @ga58lar, it does look like an okay workaround when all your static transforms come from the robot_state_publisher; unfortunately it's not the case for me, they come from at least 3 different sources. I find it quite dirty to enable the regular republishing of /tf_static from all the sources :/

tonynajjar commented 6 months ago

For anyone wanting to implement this in their own fork in the future I implemented it on humble: https://github.com/angsa-robotics/rosbag2/pull/1 (inspired from https://github.com/ros2/rosbag2/pull/1051).

I don't like maintaining a long-term fork but I ran out of options. @MichaelOrlov I understand why you don't want this to be the default behavior but I think having the option to enable it is not so bad

Aposhian commented 4 months ago

As @tonynajjar mentioned, this feature is very useful for doing analysis of single split files, either with static analysis (e.g. Foxglove) or even with playback. By embedding transient local topics in each split, then playing back each split appears as if the recorder was just a late subscriber at the time of the split start. So while yes, this feature is not exactly correct, it is incredibly useful, and I think it makes sense to have as an opt-in feature.

MichaelOrlov commented 4 months ago

@Aposhian @tonynajjar OK, it looks like it draws a picture for the feature design proposal.

  1. Will need to add a new --repeat-latched CLI option to the recorder.
  2. --repeat-latched CLI option will accept a queue depth, the number of messages to memorize on each topic. The default value is 0. i.e. the feature disabled.
  3. The uint32_t repeat_latched shall be added to the rosbag2_transport::RecordOptions data structure. The default value equal to 0 will mean that the --repeat-latched CLI option is not selected.
  4. Repeat-latched feature will memorize in RAM only the first N messages from all topics with QoS durability settings = RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL.
  5. Repeat-latched feature will save previously memorized messages at the beginning of each new bag file after the split.
  6. Repeat-latched feature will modify sent and received timestamps for memorized messages to be current before dumping them into the bag file. i.e., it shall be the same timestamps as from the latest received message.
  7. The same rosbag2_transport::RecordOptions.repeat_latched parameter shall be supported for the ros2 bag rewrite CLI.
Aposhian commented 2 months ago

@MichaelOrlov do you have any issues with @tonynajjar's implementation? https://github.com/angsa-robotics/rosbag2/pull/1/files

tonynajjar commented 2 months ago

@Aposhian I suppose the objection is the same as mentioned here since it's the same implementation

MichaelOrlov commented 2 months ago

@tonynajjar @Aposhian We just need to rewrite it to adhere to the plan proposal in the https://github.com/ros2/rosbag2/issues/1159#issuecomment-2183281437