Closed Grench6 closed 3 years ago
I re-port from YOLO4 repo route layer one more time (it indicates in your output not used variables) but it still not detecting objects... I will commit it soon... maybe the threshold is too high?
Lowering the threshold has no effect
Maybe you should try to train this model on your own? Thx!
Ok, I will try that. I will update results as soon as I have them.
I still cant train yolo4-tiny, but before posting the issue I was able to train yolo3 and yolo3-tiny and now I can not train any of those... Here is the output
user@user-pc:~/darknet2$ ./darknet detector train data/obj.data yolo-obj.cfg yolov3-tiny.conv.11
Device IDs: 1
Device ID: 0
Device name: Ellesmere
Device vendor: Advanced Micro Devices, Inc.
Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
Device opencl used: 3180.7
Device double precision: YES
Device max group size: 256
Device address bits: 64
yolo-obj
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs
14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
15 conv 21 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 21 0.004 BFLOPs
16 yolo
17 route 13 18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs
19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128
20 route 19 8 21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs
22 conv 21 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 21 0.007 BFLOPs
23 yolo
Loading weights from yolov3-tiny.conv.11...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Saving weights to backup/yolo-obj.start.conv.weights
Resizing
384
Segmentation fault (core dumped)
user@user-pc:~/darknet2$
I really have no idea what is wrong, I used the exact same files, I even created them again from zero, but it is still not working... I ran out of ideas here, training yolo3-tiny was working a few days ago...
I followed all the instructions of AlexeyAB to train, multiple times, in different ways.
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001 burn_in=1000 max_batches = 6000 policy=steps steps=4800,5400 scales=.1,.1
[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=1
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
###########
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear
[yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=2 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
[route] layers = -4
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[upsample] stride=2
[route] layers = -1, 8
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear
[yolo] mask = 0,1,2 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=2 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
No matters what I change, the result is the same
> I still cant train yolo4-tiny, but before posting the issue I was able to train yolo3 and yolo3-tiny and now I can not train any of those...
> Here is the output
>
> ```
> user@user-pc:~/darknet2$ ./darknet detector train data/obj.data yolo-obj.cfg yolov3-tiny.conv.11
> Device IDs: 1
> Device ID: 0
> Device name: Ellesmere
> Device vendor: Advanced Micro Devices, Inc.
> Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
> Device opencl used: 3180.7
> Device double precision: YES
> Device max group size: 256
> Device address bits: 64
> yolo-obj
> layer filters size input output
> 0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
> 1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
> 2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
> 3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
> 4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
> 5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
> 6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
> 7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
> 8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
> 9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
> 10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
> 11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
> 12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
> 13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs
> 14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
> 15 conv 21 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 21 0.004 BFLOPs
> 16 yolo
> 17 route 13 18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs
> 19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128
> 20 route 19 8 21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs
> 22 conv 21 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 21 0.007 BFLOPs
> 23 yolo
> Loading weights from yolov3-tiny.conv.11...Done!
> Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
> Saving weights to backup/yolo-obj.start.conv.weights
> Resizing
> 384
> Segmentation fault (core dumped)
> user@user-pc:~/darknet2$
> ```
>
> I really have no idea what is wrong, I used the exact same files, I even created them again from zero, but it is still not working... I ran out of ideas here, training yolo3-tiny was working a few days ago...
Should I use a specific branch or version? Is the master branch safe to clone? Does the images used for trainning need to be of specific size (pixelxpixel)? Is there a limit? Do I need a different procedure to train this repo? Those are other questions I have too.
@Grench6 code is fine, compilation too, your GPU needs rest, turn off your PC, unplug the power cord and give it rest about 1-2 hour and everything will be fine again :D. I often have a similar issue after many tries and OpenCL inint without deinint..., I checked and on my computer, all the mentioned training work just fine. On your end, you have garbage in VRAM that has to be cleaned up. Hope that helps.
@Grench6 btw, gdb is your friend if you build with -g flag or DEBUG=1 then you may after gdb command put your training command and see where is the breakpoint fails... if it will be in opencl.c hight probably my last comment is relevant :).
@Grench6 there was an error with OpenCL resources free in the Route layer... I have just fixed and committed it. Thx!
Sorry for late reply.
Detection is still not showing a thing
And with training... well, at least now I dont get the segmentation fault error, but now there is something else wrong. Training is not working at all, I get the following output: out.pdf avg is Nan... and it doesnt change no matter the iterations I let it run.
@Grench6 can you pls try to remove yolov4-tiny.conv.29 from train command. Thx!
Still the same with Nan: out.pdf Here is the config file if that is useful: yolov4-tiny-custom.txt I suppose data set and everything else is in good conditions, because yolov3-tiny can be trained successfully with it.
I will look into it soon, for now, I am training other models, the answer is probably in the model, I have to compare it with yolo4 and look for any additional layer or activate function I may not have in the engine, sorry for inconvenient situation with it.
Ok, no problem man. I will wait for any update.
are there some good guys sharing the data/names.list , thx i'm newbee
/darknet detector test cfg/yolov3.cfg weights/yolov3.weights data/dog.jpg ./data/coco.names Device IDs: 2 Device ID: 0 Device name: Intel(R) HD Graphics 630 Device vendor: Intel Inc. Device opencl availability: OpenCL 1.2 Device opencl used: 1.2(Apr 13 2021 00:47:18) Device double precision: NO Device max group size: 256 Device address bits: 64 names: Using default 'data/names.list' Couldn't open file: data/names.list
@aiXia121 That has nothing to do with this issue, but what you are looking for is in this link:
https://github.com/pjreddie/darknet/blob/master/data/coco.names.
Download that file, place it where it belongs and rename it. Next time open a new issue.
@Grench6 you may check now :-).
Thank you! Right now I don't have my graphics card, but I will test it as soon as I have it. 👍🏾
The window of the picture is showing, the image is there, but I can not see any detections... I use the following command:
Yolo3, yolo3-tiny and yolo4 are working as expected. Is this because yolo4-tiny is not supported?