I have trained a network based on yolov3-tiny.cfg except that it takes png images with alpha values (so 4 channels).
It trains just fine, without issue. I can kill and resume the training from the various .backup files.
When I go to test it, via
./darknet detector test data/my.data cfg/my.cfg backup/my.backup -thresh 0.05 /path/to/image/with/alpha.png
I get a Segmentation Fault
...
16 conv 256 3 x 3 / 1 26 x 26 x 768 -> 26 x 26 x 256 2.392 BFLOPs
17 conv 57 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 57 0.020 BFLOPs
18 yolo
Loading weights from backup/my.backup...Done!
Segmentation fault (core dumped)
The image I am using to test is one that the network is trained on, and I have tried several images all with the same result.
I tested darknet with imtest on the image that is failing with ./darknet detector test ...
./darknet imtest /path/to/image/with/alpha.png
And it runs fine:
Not compiled with OpenCV, saving to Original.png instead
Not compiled with OpenCV, saving to Gray.png instead
Not compiled with OpenCV, saving to C1.png instead
Not compiled with OpenCV, saving to C2.png instead
Not compiled with OpenCV, saving to C3.png instead
Not compiled with OpenCV, saving to C4.png instead
I'm not good at debugging C, but I ran gdb , and called bt when it caught the seg fault
#0 0x00007fffeebc69b0 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007fffeec74abe in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007fffeed48b17 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007fffeec76365 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007fffeeb8cd34 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007fffeeb8f130 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007fffeecdbfd5 in cuMemcpyHtoD_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007ffff7897a1e in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#8 0x00007ffff7873c51 in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#9 0x00007ffff789db28 in cudaMemcpy () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#10 0x00000000004293c1 in cuda_push_array (x_gpu=0x10296000000, x=0x8e3d770, n=692224) at ./src/cuda.c:156
#11 0x0000000000461c0e in forward_network_gpu (netp=0x8b36e0) at ./src/network.c:766
#12 0x000000000045f3d0 in forward_network (netp=0x8b36e0) at ./src/network.c:192
#13 0x000000000046082c in network_predict (net=0x8b36e0, input=0x8e3d770) at ./src/network.c:504
#14 0x0000000000421956 in test_detector (datacfg=0x7fffffffe129 "data/my.data",
cfgfile=0x7fffffffe13d "cfg/my.cfg", weightfile=0x7fffffffe151 "backup/my.backup",
filename=0x7fffffffe178 "/path/to/image/with/alpha.png", thresh=0.0500000007,
hier_thresh=0.5, outfile=0x0, fullscreen=0) at ./examples/detector.c:597
#15 0x0000000000421fa2 in run_detector (argc=9, argv=0x7fffffffdd08) at ./examples/detector.c:841
#16 0x0000000000426169 in main (argc=9, argv=0x7fffffffdd08) at ./examples/darknet.c:434
... but that was an awful lot of ?? from libcuda.... so I gdb'd it again with -nogpu and got this error:
Thread 1 "darknet" received signal SIGSEGV, Segmentation fault.
0x0000000000452c80 in im2col_get_pixel (im=0x8dd3d20, height=416, width=416, channels=4, row=271, col=88, channel=3, pad=3)
at ./src/im2col.c:11
11 return im[col + width*(row + height*channel)];
(gdb) bt
#0 0x0000000000452c80 in im2col_get_pixel (im=0x8dd3d20, height=416, width=416, channels=4, row=271, col=88, channel=3,
pad=3) at ./src/im2col.c:11
#1 0x0000000000452da4 in im2col_cpu (data_im=0x8dd3d20, channels=4, height=416, width=416, ksize=7, stride=1, pad=3,
data_col=0x7fff52ad9010) at ./src/im2col.c:34
#2 0x000000000042ae60 in forward_convolutional_layer (l=..., net=...) at ./src/convolutional_layer.c:471
#3 0x000000000045f4c3 in forward_network (netp=0x87c1b0) at ./src/network.c:204
#4 0x000000000046082c in network_predict (net=0x87c1b0, input=0x8dd3d20) at ./src/network.c:504
#5 0x0000000000421956 in test_detector (datacfg=0x7fffffffe122 "data/my.data",
cfgfile=0x7fffffffe136 "cfg/my.cfg", weightfile=0x7fffffffe14a "backup/my.backup",
filename=0x7fffffffe171 "/path/to/image/with/alpha.png", thresh=0.0500000007,
hier_thresh=0.5, outfile=0x0, fullscreen=0) at ./examples/detector.c:597
#6 0x0000000000421fa2 in run_detector (argc=10, argv=0x7fffffffdcf8) at ./examples/detector.c:841
#7 0x0000000000426169 in main (argc=10, argv=0x7fffffffdcf8) at ./examples/darknet.c:434
I have tried this with several images that were trained on, but they all at the same point (with the same values for row, col, channel, pad, etc.) I find it extremely odd since it will train w/&w/o -nogpu and the file size aught to be small enough to be in memory (676 KB) even with the full-size original 640x480 (1200 KB) on m 4GB card.
If darknet is not set up to handle the alpha channel, is there some way of loading several image files and concatenating them together?
I have trained a network based on
yolov3-tiny.cfg
except that it takes png images with alpha values (so 4 channels).It trains just fine, without issue. I can kill and resume the training from the various
.backup
files.When I go to test it, via
I get a
Segmentation Fault
The image I am using to test is one that the network is trained on, and I have tried several images all with the same result.
I tested darknet with imtest on the image that is failing with
./darknet detector test ...
And it runs fine:
I'm not good at debugging C, but I ran
gdb
, and calledbt
when it caught the seg fault... but that was an awful lot of
??
from libcuda.... so I gdb'd it again with-nogpu
and got this error:This line:
seems to be the issue, but from what I can calculate it should be well within the bounds of the image?
I have tried this with several images that were trained on, but they all at the same point (with the same values for row, col, channel, pad, etc.) I find it extremely odd since it will train w/&w/o
-nogpu
and the file size aught to be small enough to be in memory (676 KB) even with the full-size original 640x480 (1200 KB) on m 4GB card.If darknet is not set up to handle the alpha channel, is there some way of loading several image files and concatenating them together?
thanks!