Open mosheliv opened 6 years ago
Just adding some more information: compiling with debug and running with gdb, got the following:
Thread 1 "darknet" received signal SIGSEGV, Segmentation fault.
0x0000000000488312 in get_yolo_box (x=0x4691f490, biases=0xa13bb0, n=8,
index=-1376440676, i=12999987, j=12999987, lw=13, lh=13, w=416, h=416,
stride=169) at ./src/yolo_layer.c:86
86 b.x = (i + x[index + 0*stride]) / lw;
(gdb) where
#0 0x0000000000488312 in get_yolo_box (x=0x4691f490, biases=0xa13bb0, n=8,
index=-1376440676, i=12999987, j=12999987, lw=13, lh=13, w=416, h=416,
stride=169) at ./src/yolo_layer.c:86
#1 0x00000000004884d0 in delta_yolo_box (truth=..., x=0x4691f490,
biases=0xa13bb0, n=8, index=-1376440676, i=12999987, j=12999987, lw=13,
lh=13, w=416, h=416, delta=0x4646f1e0, scale=-9.9999803e+11, stride=169)
at ./src/yolo_layer.c:95
#2 0x00000000004893ff in forward_yolo_layer (l=..., net=...)
at ./src/yolo_layer.c:219
#3 0x000000000048a3c0 in forward_yolo_layer_gpu (l=..., net=...)
at ./src/yolo_layer.c:365
#4 0x0000000000463479 in forward_network_gpu (netp=0x9ca250)
at ./src/network.c:778
#5 0x00000000004607fd in forward_network (netp=0x9ca250)
at ./src/network.c:192
#6 0x0000000000460ee8 in train_network_datum (net=0x9ca250)
at ./src/network.c:293
#7 0x00000000004610c2 in train_network (net=0x9ca250, d=...)
at ./src/network.c:324
#8 0x000000000041eef2 in train_detector (datacfg=0x7fffffffe797 "oid.data",
cfgfile=0x7fffffffe7a0 "cfg/yolov3-oid.cfg",
weightfile=0x7fffffffe7b3 "darknet53.conv.74", gpus=0x7fffffffe324,
ngpus=1, clear=0) at ./examples/detector.c:118
#9 0x0000000000422a5d in run_detector (argc=6, argv=0x7fffffffe518)
at ./examples/detector.c:842
#10 0x0000000000426e66 in main (argc=6, argv=0x7fffffffe518)
at ./examples/darknet.c:434
Just check your label files, maybe some line with 0.
Can you elaborate? The label files have ids by the position of the label in the file, first one is 0 if i am not mistaken. So do you mean empty file in the ground truth? I have made sure this won't happen in the generation process. From casual look at the code it seems that because of the large amount of classes i have gone over the maxint somewhere... However, this is not easy to read or follow code so i might be wrong
I guess your some labels/xxx.txt files have x=0 or y=0, like,
0 0 0 0.059 0.008
the expect one should be
0 0.136 0.043 0.059 0.008
so, maybe you can modify the python file, darknet/scripts/voc_label.py
def convert(size, box):
x = (box[0] + box[1])/2.0 - 1
y = (box[2] + box[3])/2.0 - 1
to generate your labels file without 0 for x and y.
Oh i see! It is using middle x, y and w and h. As far as I remember i did convert everything but I'll recheck. Thank you!
Hi,
I am trying to train tolov3 on a subset of the google open images. it has 601 classes. after a little while (sometimes two lines, sometimes l20 sometime 60) it core dumps.
attached please find the cfg and data. the annotations were naturally converted but it is a bit hard to know if any were wrong as i have no idea where it core dumped. random check seems that the conversion was good.
anyone has any idea what can cause it?
Regards, Moshe
config.zip