Running out of dynamic space when receiving too many tf2_msgs/TFMessage

fairlight1337 commented 10 years ago

I encountered a symptomatic behaviour when working with nodes on the same ROS master that publish tf2_msgs/TFMessage messages while roslisp (cl-tf) is listening on the /tf topic.

Loading cl-tf and connecting to the ROS master ((roslisp-utilities:startup-ros-node)), and then letting a node publish tf2 messages at a high rate results in SLIME entering the ldb> debugger prompt, freezing everything else in Emacs.

ldb's comment on this is that the dynamic memory stack is exhausted. Initially, I had it's size set to 1024MB (default value), but changing it to 2048MB or 4096MB didn't help either. An important note here is that I didn't do anything else besides initializing the tf-listener via cl-tf.

I cannot explain this symptom, but when the /tf topic is of type tf/tfMessage, this does not happen.

Our setup here includes a C++ node for perception that publishes tf frames via a broadcaster. Since the default implementation of tf apparently changed from tf (groovy) to tf2 (hydro), initializing a tf broadcaster in C++ results in changing the topic on the ROS master to tf2_msgs/TFMessage. At least that's what rostopic info /tf tells me. Again, there is no problem when /tf is of type tf/tfMessage.

Does anybody have a clue about this? Even after restarting everything, the problem still persists. I'm using roslisp_common from source.

moesenle commented 10 years ago

How high is your publish rate? More like 30Hz or more like 300Hz? TF was never designed to handle really high rates. If you reach something like 60Hz to 100Hz, memory consumtion of nodes increases a lot and the overall system performance decreases significantly. At least that's what we saw on Rosie with old TF. You should try to not publish at a rate higher than 30Hz. Unless your transforms change quickly, it shouldn't be necessary to publish at higher rates anyway since TF's interpolation should be good enough to interpolate an exact enough pose. But that just as a side note.

I cannot find any location in the code that mentions tf2 at all, so I guess the problem might be related to mismatching message types. If the data types do not match, my guess would be that deserialization might get confused and just start to allocate much more memory than it should. But I have no explanation why exactly this should happen since topic types should be checked when a connection is established. So maybe there is also a bug in roslisp.

Unfortunately, I don't know enough about how tf2 works and what the data type on /tf actually should be. Did you try changing the data type in transform-listener.lisp to tf2_msgs/TFMessage? Maybe that's already enough.

tkruse commented 10 years ago

AFAIK tf2 is just a refactoring of tf, with one additional topic that is reserved for non-changing transforms (so that those do not have to be send over and over again). I don't expect that messages are different.

moesenle commented 10 years ago

You are right. The two message types do not differ, even the md5 sums are identical.

My next guess would be that the tf publish rate is just too high... The system always buffers all messages of the last 10 seconds.

ros / roslisp_common

Running out of dynamic space when receiving too many tf2_msgs/TFMessage #11