strands-project / strands_perception_people

long-term detection, tracking and recognition of people
96 stars 70 forks source link

Upper body detector crashes quite frequently #156

Open cdondrup opened 9 years ago

cdondrup commented 9 years ago

76 seems to have found its way to the real robot:

*** Error in `/localhome/strands/cdondrup/devel/lib/upper_body_detector/upper_body_detector': double free or corruption (out): 0x000000000292d590│··································································
 ***                                                                                                                                             │··································································
[upper_body_detector-2] process has died [pid 1563, exit code -6, cmd /localhome/strands/cdondrup/devel/lib/upper_body_detector/upper_body_detect│··································································
or __name:=upper_body_detector __log:=/localhome/strands/.ros/log/b9143b60-d922-11e4-81c0-00032d225887/upper_body_detector-2.log].               │··································································
log file: /localhome/strands/.ros/log/b9143b60-d922-11e4-81c0-00032d225887/upper_body_detector-2*.log

I will try to investigate but for now it will be "fixed" via #152 which I will PR by the end of the day.

lucasb-eyer commented 9 years ago

Hard to do much without either a reproducible case (bagfile?) or a coredump or even a backtrace.

lucasb-eyer commented 9 years ago

I managed to get a backtrace by attaching gdb and waiting for the crash. It's actually not crashing but aborting due to glibc's sanity checks for double-free or corruption upon free:

Program received signal SIGABRT, Aborted.
0x00007f4aaab27cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

The backtrace is the following:

gdb> bt
#0  0x00007f4aaab27cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f4aaab2b0d8 in __GI_abort () at abort.c:89
#2  0x00007f4aaab64394 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7f4aaac72b28 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f4aaab7066e in malloc_printerr (ptr=<optimized out>, str=0x7f4aaac72c58 "double free or corruption (out)", action=1) at malloc.c:4996
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5  0x000000000043aee6 in deallocate (this=<optimized out>, __p=<optimized out>) at /usr/include/c++/4.8/ext/new_allocator.h:110
#6  _M_deallocate (this=<optimized out>, __n=<optimized out>, __p=<optimized out>) at /usr/include/c++/4.8/bits/stl_vector.h:174
#7  ~_Vector_base (this=0xe6aac0, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/stl_vector.h:160
#8  ~vector (this=0xe6aac0, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/stl_vector.h:416
#9  ~Image_ (this=0xe6aa90, __in_chrg=<optimized out>) at /opt/ros/indigo/include/sensor_msgs/Image.h:23
#10 destroy (this=0xe6aa88) at /usr/include/boost/smart_ptr/make_shared_object.hpp:57
#11 operator() (this=0xe6aa88) at /usr/include/boost/smart_ptr/make_shared_object.hpp:87
#12 boost::detail::sp_counted_impl_pd<sensor_msgs::Image_<std::allocator<void> >*, boost::detail::sp_ms_deleter<sensor_msgs::Image_<std::allocator<void> > > >::dispose (
    this=0xe6aa70) at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:153
#13 0x000000000043c95e in boost::detail::sp_counted_base::release (this=0xe6aa70) at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
#14 0x0000000000451a7d in destroy (this=0x7fff64d04860, __p=<optimized out>) at /usr/include/c++/4.8/ext/new_allocator.h:133
#15 _M_pop_front_aux (this=0x7fff64d04860) at /usr/include/c++/4.8/bits/deque.tcc:522
#16 std::deque<ros::MessageEvent<sensor_msgs::Image_<std::allocator<void> > const>, std::allocator<ros::MessageEvent<sensor_msgs::Image_<std::allocator<void> > const> > >::pop_front (this=0x7fff64d04860) at /usr/include/c++/4.8/bits/stl_deque.h:1430
#17 0x000000000045f27b in recoverAndDelete<1> (this=0x7fff64d04800) at /opt/ros/indigo/include/message_filters/sync_policies/approximate_time.h:488
#18 message_filters::sync_policies::ApproximateTime<sensor_msgs::Image_<std::allocator<void> >, sensor_msgs::Image_<std::allocator<void> >, rwth_perception_people_msgs::GroundPlane_<std::allocator<void> >, sensor_msgs::CameraInfo_<std::allocator<void> >, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType>::publishCandidate (this=this@entry=0x7fff64d04800) at /opt/ros/indigo/include/message_filters/sync_policies/approximate_time.h:510
#19 0x0000000000463428 in message_filters::sync_policies::ApproximateTime<sensor_msgs::Image_<std::allocator<void> >, sensor_msgs::Image_<std::allocator<void> >, rwth_perception_people_msgs::GroundPlane_<std::allocator<void> >, sensor_msgs::CameraInfo_<std::allocator<void> >, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType>::process (this=this@entry=0x7fff64d04800)
    at /opt/ros/indigo/include/message_filters/sync_policies/approximate_time.h:776
#20 0x00000000004651dc in message_filters::sync_policies::ApproximateTime<sensor_msgs::Image_<std::allocator<void> >, sensor_msgs::Image_<std::allocator<void> >, rwth_perception_people_msgs::GroundPlane_<std::allocator<void> >, sensor_msgs::CameraInfo_<std::allocator<void> >, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType, message_filters::NullType>::add<0> (this=0x7fff64d04800, evt=...)
    at /opt/ros/indigo/include/message_filters/sync_policies/approximate_time.h:213
#21 0x0000000000450ded in operator() (a0=..., this=0xde9eb8) at /usr/include/boost/function/function_template.hpp:767
#22 message_filters::CallbackHelper1T<ros::MessageEvent<sensor_msgs::Image_<std::allocator<void> > const> const&, sensor_msgs::Image_<std::allocator<void> > >::call (
    this=0xde9eb0, event=..., nonconst_force_copy=<optimized out>) at /opt/ros/indigo/include/message_filters/signal1.h:76
#23 0x000000000044a31f in call (event=..., this=<optimized out>) at /opt/ros/indigo/include/message_filters/signal1.h:119
#24 signalMessage (msg=..., this=<optimized out>) at /opt/ros/indigo/include/message_filters/simple_filter.h:127
#25 image_transport::SubscriberFilter::cb (this=<optimized out>, m=...) at /opt/ros/indigo/include/image_transport/subscriber_filter.h:155
#26 0x00007f4a979cb8c9 in image_transport::RawSubscriber::internalCallback(boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> const&, boost::function<void (boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> const&)> const&) () from /opt/ros/indigo/lib//libimage_transport_plugins.so
#27 0x00007f4aad399e6c in boost::detail::function::void_function_obj_invoker1<boost::function<void (boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> const&)>, void, boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> >::invoke(boost::detail::function::function_buffer&, boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const>) () from /opt/ros/indigo/lib/libimage_transport.so
#28 0x00007f4a979d0311 in ros::SubscriptionCallbackHelperT<boost::shared_ptr<sensor_msgs::Image_<std::allocator<void> > const> const&, void>::call(ros::SubscriptionCallbackHelperCallParams&) () from /opt/ros/indigo/lib//libimage_transport_plugins.so
#29 0x00007f4aacf069c5 in ros::SubscriptionQueue::call() () from /opt/ros/indigo/lib/libroscpp.so
---Type <return> to continue, or q <return> to quit---
#30 0x00007f4aacec4777 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/indigo/lib/libroscpp.so
#31 0x00007f4aacec5583 in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/indigo/lib/libroscpp.so
#32 0x00007f4aacf094e5 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/indigo/lib/libroscpp.so
#33 0x00007f4aacef1aeb in ros::spin() () from /opt/ros/indigo/lib/libroscpp.so
#34 0x00000000004338f9 in main (argc=1, argv=<optimized out>)
    at /home/spencer/spencer_ws/src/spencer_soft/perception/rwth_people_tracker/rwth_upper_body_detector/src/main.cpp:427

The crash originates from sensor_msgs::Image's destructor freeing the _data vector. I checked, it's the RGB image:

gdb> frame 9
gdb> p encoding
$11 = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
    _M_p = 0xdbce98 "rgb8"}}

If no-one subscribes to the UBD result image, we do not even touch that pointer, ever. I'm quite certain nobody was subscribing it during this crash, which suggests a bug in either ROS or boost's shared_ptr. The latter sounds unlikely, the former not. Am I missing something?

I will fix the visualization code, though, because the current one will fail spectacularly if either the ROS image or the QImage are not contiguous.