spencer-project / spencer_people_tracking

Multi-modal ROS-based people detection and tracking framework for mobile robots developed within the context of the EU FP7 project SPENCER.
http://www.spencer.eu/
676 stars 326 forks source link

Upper_body_detection segfault, memory corruption #10

Closed srinimd2005 closed 4 years ago

srinimd2005 commented 8 years ago

Hi I managed to run the spencer tracking with some more modification. But when the tracking is done sometimes I get this error. Can someone please help me to resolve this. This happens only for upper_body_detection and not for HOG based tracking.

*** Error in `/home/ezc2x-ros/cat_ws/devel/lib/rwth_upper_body_detector/upper_body_detector': double free or corruption (!prev): 0x0000000002815770 ***
[spencer/perception_internal/people_detection/rgbd_front_top/upper_body_detector-32] process has died [pid 18968, exit code -6, cmd /home/ezc2x-ros/cat_ws/devel/lib/rwth_upper_body_detector/upper_body_detector __name:=upper_body_detector __log:=/home/ezc2x-ros/.ros/log/e03e4c96-347d-11e6-a1de-0022156bbb6b/spencer-perception_internal-people_detection-rgbd_front_top-upper_body_detector-32.log].
log file: /home/ezc2x-ros/.ros/log/e03e4c96-347d-11e6-a1de-0022156bbb6b/spencer-perception_internal-people_detection-rgbd_front_top-upper_body_detector-32*.log
srinimd2005 commented 8 years ago

When I run the node in gdb I found problems like this. Can someone give me some clue to resolve the bug..

The program being debugged has been started already.
Start it from the beginning? (y or n) 
Starting program: /home/ezc2x-ros/cat_ws/devel/lib/rwth_upper_body_detector/upper_body_detector __name:=upper_body_detector __log:=/home/ezc2x-ros/.ros/log/388125ca-36e7-11e6-9336-0022156bbb6b/spencer-perception_internal-people_detection-rgbd_front_top-upper_body_detector-32.log
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffdc230700 (LWP 18986)]
[New Thread 0x7fffdafd8700 (LWP 18988)]
[New Thread 0x7fffda7d7700 (LWP 18989)]
[New Thread 0x7fffd9fd6700 (LWP 18990)]
[New Thread 0x7fffd97d5700 (LWP 18995)]

Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff505b760 <main_arena>, bytes=136488) at malloc.c:3629
3629    malloc.c: No such file or directory.
The program being debugged has been started already.
Start it from the beginning? (y or n) 
Starting program: /home/ezc2x-ros/cat_ws/devel/lib/rwth_upper_body_detector/upper_body_detector __name:=upper_body_detector __log:=/home/ezc2x-ros/.ros/log/388125ca-36e7-11e6-9336-0022156bbb6b/spencer-perception_internal-people_detection-rgbd_front_top-upper_body_detector-32.log
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffdc230700 (LWP 19131)]
[New Thread 0x7fffdafd8700 (LWP 19133)]
[New Thread 0x7fffda7d7700 (LWP 19134)]
[New Thread 0x7fffd9fd6700 (LWP 19135)]
[New Thread 0x7fffd97d5700 (LWP 19140)]

Program received signal SIGABRT, Aborted.
0x00007ffff4cd3c37 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
The program being debugged has been started already.
Start it from the beginning? (y or n) 
Starting program: /home/ezc2x-ros/cat_ws/devel/lib/rwth_upper_body_detector/upper_body_detector __name:=upper_body_detector __log:=/home/ezc2x-ros/.ros/log/388125ca-36e7-11e6-9336-0022156bbb6b/spencer-perception_internal-people_detection-rgbd_front_top-upper_body_detector-32.log
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffdc230700 (LWP 19264)]
[New Thread 0x7fffdafd8700 (LWP 19266)]
[New Thread 0x7fffd27d7700 (LWP 19267)]
[New Thread 0x7fffda7d7700 (LWP 19268)]
[New Thread 0x7fffd9fd6700 (LWP 19273)]

Program received signal SIGSEGV, Segmentation fault.
0x000000000047892e in Detector::ExtractPointsInROIs(Vector<ROI>&, Matrix<int> const&, Matrix<int> const&, PointCloud const&, Matrix<int> const&) ()
No symbol "all" in current context.

Thread 6 (Thread 0x7fffd9fd6700 (LWP 19273)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
No locals.
#1  0x00007ffff707fd25 in bool boost::condition_variable::timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l> >(boost::unique_lock<boost::mutex>&, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l> const&) () from /opt/ros/indigo/lib/libroscpp.so
No symbol table info available.
#2  0x00007ffff707dcad in ros::CallbackQueue::callAvailable(ros::WallDuration)
    () from /opt/ros/indigo/lib/libroscpp.so
No symbol table info available.
#3  0x00007ffff70ad9e4 in ros::internalCallbackQueueThreadFunc() ()
   from /opt/ros/indigo/lib/libroscpp.so
No symbol table info available.
#4  0x00007ffff2b8aa4a in ?? ()
   from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
No symbol table info available.
#5  0x00007ffff653d184 in start_thread (arg=0x7fffd9fd6700)
    at pthread_create.c:312
        __res = <optimized out>
        pd = 0x7fffd9fd6700
Quit
The program being debugged has been started already.
Start it from the beginning? (y or n) 
Starting program: /home/ezc2x-ros/cat_ws/devel/lib/rwth_upper_body_detector/upper_body_detector __name:=upper_body_detector __log:=/home/ezc2x-ros/.ros/log/388125ca-36e7-11e6-9336-0022156bbb6b/spencer-perception_internal-people_detection-rgbd_front_top-upper_body_detector-32.log
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffdc230700 (LWP 19464)]
[New Thread 0x7fffdafd8700 (LWP 19466)]
[New Thread 0x7fffda7d7700 (LWP 19467)]
[New Thread 0x7fffd9fd6700 (LWP 19468)]
[New Thread 0x7fffd97d5700 (LWP 19473)]

Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff505b760 <main_arena>, bytes=136488) at malloc.c:3629
3629    malloc.c: No such file or directory.

Thread 6 (Thread 0x7fffd97d5700 (LWP 19473)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
No locals.
#1  0x00007ffff707fd25 in bool boost::condition_variable::timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l> >(boost::unique_lock<boost::mutex>&, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l> const&) () from /opt/ros/indigo/lib/libroscpp.so
No symbol table info available.
#2  0x00007ffff707dcad in ros::CallbackQueue::callAvailable(ros::WallDuration)
    () from /opt/ros/indigo/lib/libroscpp.so
No symbol table info available.
#3  0x00007ffff70ad9e4 in ros::internalCallbackQueueThreadFunc() ()
   from /opt/ros/indigo/lib/libroscpp.so
No symbol table info available.
#4  0x00007ffff2b8aa4a in ?? ()
   from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
No symbol table info available.
#5  0x00007ffff653d184 in start_thread (arg=0x7fffd97d5700)
    at pthread_create.c:312
        __res = <optimized out>
        pd = 0x7fffd97d5700
Quit
The program being debugged has been started already.
Start it from the beginning? (y or n) 
Starting program: /home/ezc2x-ros/cat_ws/devel/lib/rwth_upper_body_detector/upper_body_detector __name:=upper_body_detector __log:=/home/ezc2x-ros/.ros/log/388125ca-36e7-11e6-9336-0022156bbb6b/spencer-perception_internal-people_detection-rgbd_front_top-upper_body_detector-32.log
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffdc230700 (LWP 19597)]
[New Thread 0x7fffdafd8700 (LWP 19599)]
[New Thread 0x7fffda7d7700 (LWP 19600)]
[New Thread 0x7fffd9fd6700 (LWP 19601)]
[New Thread 0x7fffd97d5700 (LWP 19606)]

Program received signal SIGABRT, Aborted.
0x00007ffff4cd3c37 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Thread 6 (Thread 0x7fffd97d5700 (LWP 19606)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
No locals.
#1  0x00007ffff707fd25 in bool boost::condition_variable::timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l> >(boost::unique_lock<boost::mutex>&, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000l> const&) () from /opt/ros/indigo/lib/libroscpp.so
No symbol table info available.
#2  0x00007ffff707dcad in ros::CallbackQueue::callAvailable(ros::WallDuration)
    () from /opt/ros/indigo/lib/libroscpp.so
No symbol table info available.
#3  0x00007ffff70ad9e4 in ros::internalCallbackQueueThreadFunc() ()
   from /opt/ros/indigo/lib/libroscpp.so
No symbol table info available.
#4  0x00007ffff2b8aa4a in ?? ()
   from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
No symbol table info available.
#5  0x00007ffff653d184 in start_thread (arg=0x7fffd97d5700)
    at pthread_create.c:312
        __res = <optimized out>
        pd = 0x7fffd97d5700
Quit
tlind commented 8 years ago

This is a known issue that we also experienced a few times, but we currently cannot pinpoint the exact root cause (@lucasb-eyer). Setting respawn='true' in your launch file is a temporary workaround.

lucasb-eyer commented 8 years ago

I have edited your issue to fix formatting, please use formatting to make it at all readable!

Yes, we've had this issue for years. I've tried to figure it out multiple times in the past, to no avail. So yes, the best recommendation we can give is to set respawn='true' as @tlind suggested.

I've managed to boil the problem down to a double-free of an ImagePtr, which supposedly should never happen since it's a smart pointer. I was never able to track the first instance of it beeing free'd and neither was I able to determine under which conditions it crashes. Also, all safe-guarding I have tried failed. I gave up on this error.

srinimd2005 commented 8 years ago

Thank you for your help. I guess for me the problem starts once if I move my kinect xbox 360. Currently I changed the ground plane fixed to ground plane really fixed node. Seems to be little stable now. But I cannot move my kinect while running rgbd detector.

lucasb-eyer commented 8 years ago

Oh, that sounds very different though. The bug that's hunting me is completely unrelated to movement, so maybe this is not what I thought it is.

You should compile in debug mode, and also comment out this line, where -O3 was hardcoded (don't ask why :smile:) in order to get useful stack traces. Then, run it in gdb again, and see if it always crashes at the same code line. When you hit a crash in gdb, please also type bt and show us the output, as well as both info locals and info args. Maybe we can see what goes wrong, but I don't have too much hope as I'm not all too familiar with the code and the person who wrote it has since moved on.

rentt commented 8 years ago

pull request #12 has fixed some bugs. The upper_body_detector has been ran stable for 4 hours in my testing. I am using kinect one.

tlind commented 4 years ago

I assume that this bug has been resolved through the merge of PR #59, therefore I'll close this issue.

lucasb-eyer commented 4 years ago

image