sowson / darknet

Darknet on OpenCL Convolutional Neural Networks on OpenCL on Intel & NVidia & AMD & Mali GPUs for macOS & GNU/Linux & Windows & FreeBSD
http://pjreddie.com/darknet/
Other
184 stars 31 forks source link

Encounter a 'segmentation fault' while running detection #58

Closed KurtKoo closed 2 years ago

KurtKoo commented 3 years ago

My device is TMDSEVM572X containing 2 DSPs. I made some modification(disable the OpenCV module and 'CL_DEVICE_TYPE_GPU' ->'CL_DEVICE_TYPE_ALL') and compiled the darknet successfully according to your guide that works on beaglebone device, only with some warning reports.

When I run detection using yolov3-tiny-prn.weights, it prints some information about the DSP, then it hangs on several seconds and finally prints 'segmentation fault'. Outputs are like below: root@am57xx-evm:~/darknet-master/build# ./darknet detect yolov3-tiny-prn.cfg yolov3-tiny-prn.weights ../data/dog.jpg Device IDs: 1 Device ID: 0 Device name: TI Multicore C66 DSP Device vendor: Texas Instruments, Inc. Device opencl availability: OpenCL 1.2 Device opencl used: Device double precision: YES Device max group size: 1073741824 Device address bits: 32 names: Using default 'data/names.list' Segmentation fault (core dumped)

I thought it could be due to the lack of runtime memory? Is there a possible solution? Any help could be useful! Thanks a lot!

compilation warning reports: [ 2%] Building C object CMakeFiles/libdarknet_s.dir/examples/attention.c.o /home/root/darknet-master/examples/attention.c: In function 'train_attention': /home/root/darknet-master/examples/attention.c:158:21: warning: argument 1 range [2147483648, 4294967295] exceeds maximum object size 2147483647 [-Walloc-size-larger-than=] int inds = calloc(resized.y.rows, sizeof(int)); ^~~~~~~~~~~ In file included from /home/root/darknet-master/include/darknet.h:4, from /home/root/darknet-master/examples/attention.c:1: /usr/include/stdlib.h:541:14: note: in a call to allocation function 'calloc' declared here extern void calloc (size_t __nmemb, size_t __size)

[ 25%] Building C object CMakeFiles/libdarknet_s.dir/src/utils.c.o /home/root/darknet-master/src/utils.c: In function 'rand_size_t': /home/root/darknet-master/src/utils.c:786:36: warning: left shift count >= width of type [-Wshift-count-overflow] return ((size_t)(rand()&0xff) << 56) | ^~ /home/root/darknet-master/src/utils.c:787:32: warning: left shift count >= width of type [-Wshift-count-overflow] ((size_t)(rand()&0xff) << 48) | ^~ /home/root/darknet-master/src/utils.c:788:32: warning: left shift count >= width of type [-Wshift-count-overflow] ((size_t)(rand()&0xff) << 40) | ^~ /home/root/darknet-master/src/utils.c:789:32: warning: left shift count >= width of type [-Wshift-count-overflow] ((size_t)(rand()&0xff) << 32) |

[ 58%] Building C object CMakeFiles/libdarknet.dir/src/utils.c.o (same as [25%])

[ 68%] Building C object CMakeFiles/bindarknet.dir/examples/attention.c.o [same as [2%]]

[ 91%] Building C object CMakeFiles/bindarknet.dir/src/utils.c.o (same as 25%) ^~

sowson commented 3 years ago

@KurtKoo I am sorry it takes such a long, but I do not know how to help... I have only beagle board X15 but it is a different case I suppose... thanks for trying this solution! 👍

KurtKoo commented 3 years ago

It's alright. Maybe it was because I disabled the OpenCV module or the memory for OpenCL is not enough.

I'll try to solve these problems.

Thanks!

dna2github commented 2 years ago

in your case,

names: Using default 'data/names.list'
Segmentation fault (core dumped)

it even not printed your network layer info, which means that your binary probably crashed during loading network layers phase.

in general, you need to add printf to track where the error occurs.

for example

printf("p1\n");
action1()
printf("p2\n");
action2()
printf("p3\n");

then you can know, it crashed at action1 or action2 or otherwise.

I got 2 segment fault on macos, one is that if no name.list specified, get_labels function directly write default value string data/names.list terminator \0 to 0 (\0 == 0; notice that the string has length 15, and filename[strscn(filename, "\n\r")] = 0 -> filename[15] = 0 -> filename[15]=>'\0'); however, macos will crash it for it has security risk where it is actually a case of buffer overflow.

the other is that in darknet official example, there is one arg missing (darknet xxx <data.cfg> model.cfg weights.bin ...); if you do not provide data.cfg, weights.bin will be recognized as model.cfg and got segment fault.

KurtKoo commented 2 years ago

@dna2github Thanks for your reply. I'm working on another task now. So I gave it up. But thank you very much!