ros2 / rviz

ROS 3D Robot Visualizer
BSD 3-Clause Clear License
293 stars 211 forks source link

Major performance issue when using a tf2_ros::MessageFilter without a timeout #747

Open doisyg opened 3 years ago

doisyg commented 3 years ago

We found out that there is a major performance issue when using a tf2_ros::MessageFilter which arise when it has to deal with a TF extrapolation error. One symptom is a high CPU usage and another one is the queue filling up and multiple message in the console:

Message Filter dropping message: frame 'laser' at time ... for reason 'discarding message because the queue is full'

There is probably an issue with the MessageFilter itself, reported here: https://github.com/ros2/geometry2/issues/366 However the issue disappears when passing a timeout on the tf2_ros::MessageFilter constructor. This is how the issue was solved for nav2 obstacle_layer and STVL, see here: https://github.com/ros-planning/navigation2/issues/2333

I believe I see this issue also in rviz, which could be explained by the usage of a tf2_ros::MessageFilter here: https://github.com/ros2/rviz/blob/617cbc182e42a225bdf6076b1f80401177ba5f1d/rviz_common/include/rviz_common/message_filter_display.hpp#L126-L131

It would help using a timeout there too. And independently from that tf2_ros::MessageFilter performance issue, it would make senses to have a timeout there. I am willing to do a PR but what would be a good value for this timeout ?

gbalke commented 2 years ago

If I understand correctly, MessageFilter is essentially getting backed up due to spam of invalid messages? You are interested in clearing the messages that aren't posted after a given timeout?

doisyg commented 2 years ago

Precisely

gbalke commented 2 years ago

I think it's hard to define a timeout for this sort of thing given the various rates of publishing. In my mind, you'd want a pretty short timeout for a high publish rate message and a pretty low timeout for low publishing rates. I'm not intimately familiar with how this works so it'll probably take me some time to read up to the point I am comfortable giving a recommendation.

I would think the best thing here would be a error queue with max depth but that becomes complicated if this filter is processing many different messages that are all causing errors (don't want to silence an error just because it isn't spamming).

doisyg commented 2 years ago

Well a long timeout to start with would be better than nothing I believe (i.e. infinite)

gbalke commented 2 years ago

Where would this value be updated? A value of infinity preserves current behavior, correct?

jacobperron commented 2 years ago

IIUC, the correct thing to do would be to resolve https://github.com/ros2/geometry2/issues/366, however I'm open to adding a timeout to the message filter here as a workaround. Maybe the timeout could be linked to the buffer cache time being proposed in https://github.com/ros2/rviz/pull/780?