pushkalkatara / darknet_ros

Robotics Operating System Package for Yolo v3 based on darknet with optimized tracking using Kalman Filter and Optical Flow.
MIT License
54 stars 28 forks source link

crash after several seconds (malloc(): memory corruption) #3

Open BRNKR opened 5 years ago

BRNKR commented 5 years ago

Hey,

i am running this node on my Jetson TX2. After some seconds i get this crash error:

Error in `/home/nvidia/catkin_ws/devel/lib/darknet_ros/darknet_ros': malloc(): memory corruption: 0x00000001007389c0

and sometimes this:

Error in `/home/nvidia/catkin_ws/devel/lib/darknet_ros/darknet_ros': free(): invalid next size (normal): 0x00000000006f7000

any idea?

pushkalkatara commented 5 years ago

Hi, could you please set up AlexeyAB repo and run the same model without ROS code and check if the error still presists.

pushkalkatara commented 5 years ago

Also, are you running with Optical Flow enabled? Is the OpenCV3 package for ROS installed?

pushkalkatara commented 5 years ago

These errors basically occur in the case of memory overflow, writing to unallocated memory and reading it. Also referencing this issue - AlexeyAB Issue - 1628#

BRNKR commented 5 years ago

@pushkalkatara thanks for your help. i will test it asap.

DeepDuke commented 5 years ago

Hi! Except for "malloc" and "free" problems, I also met

[darknet_ros-1] process has died [pid 13109], exit code -11

I am quite confused about this problem. I am running this package on tx2 with a Intel Realsense Camera D415. By the way, could you please tell me how can I run with Optical Flow enabled?

pushkalkatara commented 5 years ago

Hi @DeepDuke, exit code -11 basically occurs due to segmentation fault-invalid memory access.

optical flow would work if you have opencv-contrib ros package installed.

DeepDuke commented 5 years ago

Hi@pushkalkatara The node died after running for about 3 minutes. I think I have put the weight and config files into the correct directory.

pushkalkatara commented 5 years ago

@DeepDuke if the node ran for 3 minutes with detection, then weights and config directory seems fine. While running the node, use System Monitor to see Memory and Swap History. Please paste the results here if the memory usage increases with time or constant. Try adding swap memory and re-running the node.

DeepDuke commented 5 years ago

@pushkalkatara I run the node on NVIDIA Jetson TX2. When i ran yolo3, the memory increased from 3.49G/7.67G to 4.50G/7.67G. It keeped at 4.50G/7.67G until the node died. The swap was always 0k/0k. I also tried yolov3-tiny. When i run yolo3-tiny, the memory rised from 3.49G/7.67G to 3.72G/7.67G. When the ndoe died, the memory returned to 3.49G/7.67G. The swap still keepd unchanged at 0k/0k.

DeepDuke commented 5 years ago

@pushkalkatara BTW, yolo3-tiny could only run for less than a minute.

pushkalkatara commented 5 years ago

@DeepDuke how much time did it sustain 4.50G/7.67G until the node died?

DeepDuke commented 5 years ago

@pushkalkatara about 2-3 minutes for yolo3, less than a minute for yolo3-tiny

NicolasBernard456 commented 5 years ago

Hello, not sure if related but i got a similar issue. As soon as Yolo detect an object and the tracking starts it crashes. GDB gave the following infos:

`Thread 1 "darknet_ros" received signal SIGSEGV, Segmentation fault. darknet_ros::YoloROSTracker::trackThread (this=this@entry=0x7fffffffc130) at /home/nico/catkin_ws/src/darknet_ros/darknet_ros/src/YoloROSTracker.cpp:217 217 it->frames_counter = temp; (gdb) bt

0 darknet_ros::YoloROSTracker::trackThread (this=this@entry=0x7fffffffc130) at /home/nico/catkin_ws/src/darknet_ros/darknet_ros/src/YoloROSTracker.cpp:217

1 0x00007ffff7822378 in darknet_ros::YoloROSTracker::YoloROSTracker (this=0x7fffffffc130, nh=...) at /home/nico/catkin_ws/src/darknet_ros/darknet_ros/src/YoloROSTracker.cpp:24

2 0x00000000004058b7 in main (argc=1, argv=0x7fffffffc728) at /home/nico/catkin_ws/src/darknet_ros/darknet_ros/src/yolo_object_detector_node.cpp:15`

pushkalkatara commented 5 years ago

Hi @NicolasBernard456 . Here are some steps for debugging -

Shame-fight commented 5 years ago

@pushkalkatara Thanks for the work you did, the code doesn't work on my computer and TX2. When I run the code "roslaunch darknet_ros darknet_ros.launch", the result is as follows. And there is no detection window, just the result under the terminal.

[ INFO] [1571276131.857998518]: Started darknet thread [ INFO] [1571276131.858011827]: Reference Count > 0

[ INFO] [1571276131.858184366]: Inside tracking [darknet_ros-1] process has died [pid 10361, exit code -11, cmd /home/jgx/catkin_ws/devel/lib/darknet_ros/darknet_ros __name:=darknet_ros __log:=/home/jgx/.ros/log/05036024-f07e-11e9-8718-983f9f190824/darknet_ros-1.log]. log file: /home/jgx/.ros/log/05036024-f07e-11e9-8718-983f9f190824/darknet_ros-1*.log all processes on machine have died, roslaunch will exit shutting down processing monitor... ... shutting down processing monitor complete done

I found that the program will be killed immediately if the camera moves.I use the usb_cam package.I want to know how can solve this problems.thanks

Shame-fight commented 5 years ago

您好,不确定是否相关,但我遇到了类似的问题。一旦Yolo检测到物体并开始跟踪,它就会崩溃。GDB提供了以下信息:

Thread 1 "darknet_ros" received signal SIGSEGV, Segmentation fault. darknet_ros::YoloROSTracker::trackThread (this=this@entry=0x7fffffffc130) at /home/nico/catkin_ws/src/darknet_ros/darknet_ros/src/YoloROSTracker.cpp:217 217 it->frames_counter = temp; (gdb) bt #0 darknet_ros::YoloROSTracker::trackThread (this=this@entry=0x7fffffffc130) at /home/nico/catkin_ws/src/darknet_ros/darknet_ros/src/YoloROSTracker.cpp:217 #1 0x00007ffff7822378 in darknet_ros::YoloROSTracker::YoloROSTracker (this=0x7fffffffc130, nh=...) at /home/nico/catkin_ws/src/darknet_ros/darknet_ros/src/YoloROSTracker.cpp:24 #2 0x00000000004058b7 in main (argc=1, argv=0x7fffffffc728) at /home/nico/catkin_ws/src/darknet_ros/darknet_ros/src/yolo_object_detector_node.cpp:15

I have the same problems with you,did you have solve it? As soon as Yolo detect an object it crashes:[darknet_ros-1] process has died [pid 4547, exit code -11...