I have a bag file which I replay from the beginning a few times in succession, and one of my nodes processing IMU messages stopped being able to process messages for no obvious reason. In the end, it was processing e.g. at 0.1 Hz while the messages are coming at 100 Hz. Restarting the node fixes the issue temporarily.
Analysis
I accidentaly enabled topic statistics by setting /enable_statistics ROS param to true. With topic statistics enabled, this bug manifested.
stats.last_publish is updated with the last time when a message was published to /statistics. If then a bag file starts playing again, the check referenced above will for long not be true (actually never again if you're playing the same bag).
There is a list in stats.arrival_time_list that is cleared only when a stats message is published:
So this list is growing without bounds when the bag is replayed. Each received message adds a record to this list. With IMU messages @ 100 Hz, this quickly comes to the order of 1e5 and even higher orders.
Finally, this growing list is being copied twice in each call of the callback() function:
These copies are what is slowing the subscriber down to unacceptable slowness after a few bag replays. Whenever I pause the node in GDB, PollManager is caught copying this list:
How I found it
I have a bag file which I replay from the beginning a few times in succession, and one of my nodes processing IMU messages stopped being able to process messages for no obvious reason. In the end, it was processing e.g. at 0.1 Hz while the messages are coming at 100 Hz. Restarting the node fixes the issue temporarily.
Analysis
I accidentaly enabled topic statistics by setting
/enable_statistics
ROS param to true. With topic statistics enabled, this bug manifested.https://github.com/ros/ros_comm/blob/dd78ac8af128bb8eb992d6431bb9f994658ea6ab/clients/roscpp/src/libros/statistics.cpp#L118
stats.last_publish
is updated with the last time when a message was published to/statistics
. If then a bag file starts playing again, the check referenced above will for long not be true (actually never again if you're playing the same bag).There is a list in
stats.arrival_time_list
that is cleared only when a stats message is published:https://github.com/ros/ros_comm/blob/dd78ac8af128bb8eb992d6431bb9f994658ea6ab/clients/roscpp/src/libros/statistics.cpp#L252
So this list is growing without bounds when the bag is replayed. Each received message adds a record to this list. With IMU messages @ 100 Hz, this quickly comes to the order of 1e5 and even higher orders.
Finally, this growing list is being copied twice in each call of the
callback()
function:https://github.com/ros/ros_comm/blob/dd78ac8af128bb8eb992d6431bb9f994658ea6ab/clients/roscpp/src/libros/statistics.cpp#L84
and
https://github.com/ros/ros_comm/blob/dd78ac8af128bb8eb992d6431bb9f994658ea6ab/clients/roscpp/src/libros/statistics.cpp#L258
These copies are what is slowing the subscriber down to unacceptable slowness after a few bag replays. Whenever I pause the node in GDB,
PollManager
is caught copying this list:I'll sketch a PR fixing this by detecting ROS time jumping back and resetting the timestamp check.