sowson / darknet

Darknet on OpenCL Convolutional Neural Networks on OpenCL on Intel & NVidia & AMD & Mali GPUs for macOS & GNU/Linux & Windows & FreeBSD
http://pjreddie.com/darknet/
Other
184 stars 31 forks source link

Yolov4-tiny not showing detections #41

Closed Grench6 closed 3 years ago

Grench6 commented 3 years ago

The window of the picture is showing, the image is there, but I can not see any detections... I use the following command:

user@user-pc:~/darknet$ ./darknet detector test cfg/coco.data cfg/yolov4-tiny.cfg weights/yolov4-tiny.weights data/dog.jpg 
Device IDs: 1
Device ID: 0
Device name: Ellesmere
Device vendor: Advanced Micro Devices, Inc.
Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
Device opencl used: 3180.7
Device double precision: YES
Device max group size: 256
Device address bits: 64
layer     filters    size              input                output
    0 conv     32  3 x 3 / 2   416 x 416 x   3   ->   208 x 208 x  32  0.075 BFLOPs
    1 conv     64  3 x 3 / 2   208 x 208 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    2 conv     64  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x  64  0.797 BFLOPs
    3 route  2
Unused field: 'groups = 2'
Unused field: 'group_id = 1'
    4 conv     32  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x  32  0.399 BFLOPs
    5 conv     32  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  32  0.199 BFLOPs
    6 route  5 4
    7 conv     64  1 x 1 / 1   104 x 104 x  64   ->   104 x 104 x  64  0.089 BFLOPs
    8 route  2 7
    9 max          2 x 2 / 2   104 x 104 x 128   ->    52 x  52 x 128
   10 conv    128  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 128  0.797 BFLOPs
   11 route  10
Unused field: 'groups = 2'
Unused field: 'group_id = 1'
   12 conv     64  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x  64  0.399 BFLOPs
   13 conv     64  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x  64  0.199 BFLOPs
   14 route  13 12
   15 conv    128  1 x 1 / 1    52 x  52 x 128   ->    52 x  52 x 128  0.089 BFLOPs
   16 route  10 15
   17 max          2 x 2 / 2    52 x  52 x 256   ->    26 x  26 x 256
   18 conv    256  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 256  0.797 BFLOPs
   19 route  18
Unused field: 'groups = 2'
Unused field: 'group_id = 1'
   20 conv    128  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 128  0.399 BFLOPs
   21 conv    128  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 128  0.199 BFLOPs
   22 route  21 20
   23 conv    256  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 256  0.089 BFLOPs
   24 route  18 23
   25 max          2 x 2 / 2    26 x  26 x 512   ->    13 x  13 x 512
   26 conv    512  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x 512  0.797 BFLOPs
   27 conv    256  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 256  0.044 BFLOPs
   28 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   29 conv    255  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 255  0.044 BFLOPs
   30 yolo4
[yolo4] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
   31 route  27
   32 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   33 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   34 route  33 23
   35 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   36 conv    255  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 255  0.088 BFLOPs
   37 yolo4
[yolo4] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Loading weights from weights/yolov4-tiny.weights...Done!
data/dog.jpg: Predicted in 0.393254 seconds.
user@user-pc:~/darknet$

Yolo3, yolo3-tiny and yolo4 are working as expected. Is this because yolo4-tiny is not supported?

sowson commented 3 years ago

I re-port from YOLO4 repo route layer one more time (it indicates in your output not used variables) but it still not detecting objects... I will commit it soon... maybe the threshold is too high?

Grench6 commented 3 years ago

Lowering the threshold has no effect

sowson commented 3 years ago

Maybe you should try to train this model on your own? Thx!

Grench6 commented 3 years ago

Ok, I will try that. I will update results as soon as I have them.

Grench6 commented 3 years ago

I still cant train yolo4-tiny, but before posting the issue I was able to train yolo3 and yolo3-tiny and now I can not train any of those... Here is the output

user@user-pc:~/darknet2$ ./darknet detector train data/obj.data yolo-obj.cfg yolov3-tiny.conv.11
Device IDs: 1
Device ID: 0
Device name: Ellesmere
Device vendor: Advanced Micro Devices, Inc.
Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
Device opencl used: 3180.7
Device double precision: YES
Device max group size: 256
Device address bits: 64
yolo-obj
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   15 conv     21  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x  21  0.004 BFLOPs
   16 yolo
   17 route  13   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   20 route  19 8   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   22 conv     21  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x  21  0.007 BFLOPs
   23 yolo
Loading weights from yolov3-tiny.conv.11...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Saving weights to backup/yolo-obj.start.conv.weights
Resizing
384
Segmentation fault (core dumped)
user@user-pc:~/darknet2$

I really have no idea what is wrong, I used the exact same files, I even created them again from zero, but it is still not working... I ran out of ideas here, training yolo3-tiny was working a few days ago...

Grench6 commented 3 years ago

I followed all the instructions of AlexeyAB to train, multiple times, in different ways.

learning_rate=0.001 burn_in=1000 max_batches = 6000 policy=steps steps=4800,5400 scales=.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear

[yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=2 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 8

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear

[yolo] mask = 0,1,2 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=2 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1



No matters what I change, the result is the same

> I still cant train yolo4-tiny, but before posting the issue I was able to train yolo3 and yolo3-tiny and now I can not train any of those...
> Here is the output
> 
> ```
> user@user-pc:~/darknet2$ ./darknet detector train data/obj.data yolo-obj.cfg yolov3-tiny.conv.11
> Device IDs: 1
> Device ID: 0
> Device name: Ellesmere
> Device vendor: Advanced Micro Devices, Inc.
> Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
> Device opencl used: 3180.7
> Device double precision: YES
> Device max group size: 256
> Device address bits: 64
> yolo-obj
> layer     filters    size              input                output
>     0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
>     1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
>     2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
>     3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
>     4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
>     5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
>     6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
>     7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
>     8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
>     9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
>    10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
>    11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
>    12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
>    13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
>    14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
>    15 conv     21  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x  21  0.004 BFLOPs
>    16 yolo
>    17 route  13   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
>    19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
>    20 route  19 8   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
>    22 conv     21  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x  21  0.007 BFLOPs
>    23 yolo
> Loading weights from yolov3-tiny.conv.11...Done!
> Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
> Saving weights to backup/yolo-obj.start.conv.weights
> Resizing
> 384
> Segmentation fault (core dumped)
> user@user-pc:~/darknet2$
> ```
> 
> I really have no idea what is wrong, I used the exact same files, I even created them again from zero, but it is still not working... I ran out of ideas here, training yolo3-tiny was working a few days ago...
Grench6 commented 3 years ago

Should I use a specific branch or version? Is the master branch safe to clone? Does the images used for trainning need to be of specific size (pixelxpixel)? Is there a limit? Do I need a different procedure to train this repo? Those are other questions I have too.

sowson commented 3 years ago

@Grench6 code is fine, compilation too, your GPU needs rest, turn off your PC, unplug the power cord and give it rest about 1-2 hour and everything will be fine again :D. I often have a similar issue after many tries and OpenCL inint without deinint..., I checked and on my computer, all the mentioned training work just fine. On your end, you have garbage in VRAM that has to be cleaned up. Hope that helps.

sowson commented 3 years ago

@Grench6 btw, gdb is your friend if you build with -g flag or DEBUG=1 then you may after gdb command put your training command and see where is the breakpoint fails... if it will be in opencl.c hight probably my last comment is relevant :).

sowson commented 3 years ago

@Grench6 there was an error with OpenCL resources free in the Route layer... I have just fixed and committed it. Thx!

Grench6 commented 3 years ago

Sorry for late reply.

Detection is still not showing a thing

Screenshot from 2020-11-24 17-42-08

And with training... well, at least now I dont get the segmentation fault error, but now there is something else wrong. Training is not working at all, I get the following output: out.pdf avg is Nan... and it doesnt change no matter the iterations I let it run.

sowson commented 3 years ago

@Grench6 can you pls try to remove yolov4-tiny.conv.29 from train command. Thx!

Grench6 commented 3 years ago

Still the same with Nan: out.pdf Here is the config file if that is useful: yolov4-tiny-custom.txt I suppose data set and everything else is in good conditions, because yolov3-tiny can be trained successfully with it.

sowson commented 3 years ago

I will look into it soon, for now, I am training other models, the answer is probably in the model, I have to compare it with yolo4 and look for any additional layer or activate function I may not have in the engine, sorry for inconvenient situation with it.

Grench6 commented 3 years ago

Ok, no problem man. I will wait for any update.

aiXia121 commented 3 years ago

are there some good guys sharing the data/names.list , thx i'm newbee

/darknet detector test cfg/yolov3.cfg weights/yolov3.weights data/dog.jpg ./data/coco.names Device IDs: 2 Device ID: 0 Device name: Intel(R) HD Graphics 630 Device vendor: Intel Inc. Device opencl availability: OpenCL 1.2 Device opencl used: 1.2(Apr 13 2021 00:47:18) Device double precision: NO Device max group size: 256 Device address bits: 64 names: Using default 'data/names.list' Couldn't open file: data/names.list

Grench6 commented 3 years ago

@aiXia121 That has nothing to do with this issue, but what you are looking for is in this link:

https://github.com/pjreddie/darknet/blob/master/data/coco.names.

Download that file, place it where it belongs and rename it. Next time open a new issue.

sowson commented 3 years ago

@Grench6 you may check now :-). predictions

Grench6 commented 3 years ago

Thank you! Right now I don't have my graphics card, but I will test it as soon as I have it. 👍🏾