ros-visualization / rviz

ROS 3D Robot Visualizer
BSD 3-Clause "New" or "Revised" License
841 stars 463 forks source link

rqt_rviz crashes due to rqt threading change #1042

Open mikepurvis opened 8 years ago

mikepurvis commented 8 years ago

Previous discussion: https://github.com/ros-visualization/rqt/issues/96

My understanding is that in the case of rqt_rviz, we now have two threads both calling default ROS callbacks:

However, it's not clear to me what CBs are getting called by VisualizationManager::onUpdate, because almost everything seems to funnel through one of these two NHs which already have explicit callback queues set:

  update_nh_.setCallbackQueue( context_->getUpdateQueue() );
  threaded_nh_.setCallbackQueue( context_->getThreadedQueue() );

The onUpdate call happens on the QT thread, and is set up here:

  update_timer_ = new QTimer;
  connect( update_timer_, SIGNAL( timeout() ), this, SLOT( onUpdate() ));

So I think the source of the problem is there's something registered on the default callback queue which is expecting to be able to do QT state mutation, and now can't (under rqt_rviz) because that callback sometimes occurs on the thread which was added in https://github.com/ros-visualization/rqt/pull/95.

Is there some way to audit the callbacks which are ending up on the different queues? IMO the fix here is going to be:

I believe this would turn the async spinner supplied by rqt into a no-op, and restore order in rviz land.

@wjwwood @abencz @sromberg

wjwwood commented 8 years ago

Is there some way to audit the callbacks which are ending up on the different queues?

Not that I'm aware of.

Switch the onUpdate spinOnce call to be an explicit queue invocation.

Sounds like a good idea.

Find out which callbacks are getting called on that spinner, and switch the relevant NodeHandle instances to the explicit queue.

What is "that spinner" in this context? If you mean the spinner in onUpdate, why would you need to switch things away from that?

mikepurvis commented 8 years ago

I want to switch anything presently on that spinner to go to the explicit queue for it so that nothing is being run by the catch-all async spinner.

wjwwood commented 8 years ago

I'd search for uses of NodeHandle within rviz, and make sure none of them use the singleton which doesn't have a custom callback queue setup. I don't think any form of runtime auditing would be complete enough.

mikepurvis commented 8 years ago

I had been starting here: https://github.com/ros-visualization/rviz/search?q=NodeHandle

A bunch are tests of course, and it seems that several of the tools (point_tool, goal_tool) have their own NodeHandle, but that should be harmless, since it's only being used to advertise topics, and therefore has no callbacks.

wjwwood commented 8 years ago

Be careful, the GitHub search does not return all matches, only the first few per file. I'd highly recommend using a local search of the code with grep or your editor of choice.