Open leyuan opened 6 years ago
try to increase the height and width in your cfg file , try with both equal to 832 and see if improves
Regards Ahsan
On Wed, Mar 14, 2018 at 11:37 AM, Leyuan notifications@github.com wrote:
Hi all,
I am training YOLOv2 on the following images [image: b42] https://user-images.githubusercontent.com/1417993/37386732-4636a94a-271f-11e8-9afb-00160856ae7e.jpeg
The part I want the bounding box to return is the black/marked area... as you can see it is pretty small. I have annotated the images.
However, after start training, I am getting NaN for the results start with line 0, as many people suggested it could be the annotation go wrong, I used the script provided here https://github.com/Guanghan/darknet/blob/master/scripts/convert.py and the labels look like this
0 0.421875 0.606796116505 0.114583333333 0.0582524271845 <+58%20252-4271845>
I am not sure if the numbers above ^^ are too small
(py27) linux:~/Desktop/ws/defection-detection/darknet$ ./darknet detector train bottle-cfg/obj.data bottle-cfg/yolo-obj.cfg conv.23 yolo-obj layer filters size input output 0 conv 32 3 x 3 / 1 288 x 416 x 3 -> 288 x 416 x 32 1 max 2 x 2 / 2 288 x 416 x 32 -> 144 x 208 x 32 2 conv 64 3 x 3 / 1 144 x 208 x 32 -> 144 x 208 x 64 3 max 2 x 2 / 2 144 x 208 x 64 -> 72 x 104 x 64 4 conv 128 3 x 3 / 1 72 x 104 x 64 -> 72 x 104 x 128 5 conv 64 1 x 1 / 1 72 x 104 x 128 -> 72 x 104 x 64 6 conv 128 3 x 3 / 1 72 x 104 x 64 -> 72 x 104 x 128 7 max 2 x 2 / 2 72 x 104 x 128 -> 36 x 52 x 128 8 conv 256 3 x 3 / 1 36 x 52 x 128 -> 36 x 52 x 256 9 conv 128 1 x 1 / 1 36 x 52 x 256 -> 36 x 52 x 128 10 conv 256 3 x 3 / 1 36 x 52 x 128 -> 36 x 52 x 256 11 max 2 x 2 / 2 36 x 52 x 256 -> 18 x 26 x 256 12 conv 512 3 x 3 / 1 18 x 26 x 256 -> 18 x 26 x 512 13 conv 256 1 x 1 / 1 18 x 26 x 512 -> 18 x 26 x 256 14 conv 512 3 x 3 / 1 18 x 26 x 256 -> 18 x 26 x 512 15 conv 256 1 x 1 / 1 18 x 26 x 512 -> 18 x 26 x 256 16 conv 512 3 x 3 / 1 18 x 26 x 256 -> 18 x 26 x 512 17 max 2 x 2 / 2 18 x 26 x 512 -> 9 x 13 x 512 18 conv 1024 3 x 3 / 1 9 x 13 x 512 -> 9 x 13 x1024 19 conv 512 1 x 1 / 1 9 x 13 x1024 -> 9 x 13 x 512 20 conv 1024 3 x 3 / 1 9 x 13 x 512 -> 9 x 13 x1024 21 conv 512 1 x 1 / 1 9 x 13 x1024 -> 9 x 13 x 512 22 conv 1024 3 x 3 / 1 9 x 13 x 512 -> 9 x 13 x1024 23 conv 1024 3 x 3 / 1 9 x 13 x1024 -> 9 x 13 x1024 24 conv 1024 3 x 3 / 1 9 x 13 x1024 -> 9 x 13 x1024 25 route 16 26 reorg / 2 18 x 26 x 512 -> 9 x 13 x2048 27 route 26 24 28 conv 1024 3 x 3 / 1 9 x 13 x3072 -> 9 x 13 x1024 29 conv 30 1 x 1 / 1 9 x 13 x1024 -> 9 x 13 x 30 30 detection Loading weights from conv.23... seen 32 Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 Loaded: 2.960990 seconds Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.492677, Avg Recall: -nan, count: 0
And here is my .cfg file which is merely a copy of yolo-voc2.0.cfg
[net] batch=64 subdivisions=8 height=416 width=288 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1
learning_rate=0.0001 max_batches = 45000 policy=steps steps=100,25000,35000 scales=10,.1,.1
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
#######
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[route] layers=-9
[reorg] stride=2
[route] layers=-1,-3
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=30 activation=linear
[region] anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 bias_match=1 classes=1 coords=4 num=5 softmax=1 jitter=.2 rescore=1
object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1
absolute=1 thresh = .6 random=0
Appreciate for your help!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pjreddie/darknet/issues/531, or mute the thread https://github.com/notifications/unsubscribe-auth/AK9zjtD4Z1s7NEArpfhKGYCGOS-A0N1Vks5teLq7gaJpZM4Sp6Fx .
Thanks @ahsan856jalal, for the past few days I was having trouble to make my annotation tool to work with 832 images, any tool you would recommend?
The one I am currently using is
https://github.com/puzzledqs/BBox-Label-Tool
Never mind, got the label working
@ahsan856jalal
I have tried larger images 832*832 like this one
And its annotation 0 0.579927884615 0.575120192308 0.126201923077 0.123798076923
But still got NaN, sincerely thank you for your help
(py27) yul@linux:~/Desktop/ws/defection-detection/darknet$ ./darknet detector train bottle-cfg/obj.data bottle-cfg/yolo-obj.cfg conv.23
yolo-obj
layer filters size input output
0 conv 32 3 x 3 / 1 832 x 832 x 3 -> 832 x 832 x 32
1 max 2 x 2 / 2 832 x 832 x 32 -> 416 x 416 x 32
2 conv 64 3 x 3 / 1 416 x 416 x 32 -> 416 x 416 x 64
3 max 2 x 2 / 2 416 x 416 x 64 -> 208 x 208 x 64
4 conv 128 3 x 3 / 1 208 x 208 x 64 -> 208 x 208 x 128
5 conv 64 1 x 1 / 1 208 x 208 x 128 -> 208 x 208 x 64
6 conv 128 3 x 3 / 1 208 x 208 x 64 -> 208 x 208 x 128
7 max 2 x 2 / 2 208 x 208 x 128 -> 104 x 104 x 128
8 conv 256 3 x 3 / 1 104 x 104 x 128 -> 104 x 104 x 256
9 conv 128 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 128
10 conv 256 3 x 3 / 1 104 x 104 x 128 -> 104 x 104 x 256
11 max 2 x 2 / 2 104 x 104 x 256 -> 52 x 52 x 256
12 conv 512 3 x 3 / 1 52 x 52 x 256 -> 52 x 52 x 512
13 conv 256 1 x 1 / 1 52 x 52 x 512 -> 52 x 52 x 256
14 conv 512 3 x 3 / 1 52 x 52 x 256 -> 52 x 52 x 512
15 conv 256 1 x 1 / 1 52 x 52 x 512 -> 52 x 52 x 256
16 conv 512 3 x 3 / 1 52 x 52 x 256 -> 52 x 52 x 512
17 max 2 x 2 / 2 52 x 52 x 512 -> 26 x 26 x 512
18 conv 1024 3 x 3 / 1 26 x 26 x 512 -> 26 x 26 x1024
19 conv 512 1 x 1 / 1 26 x 26 x1024 -> 26 x 26 x 512
20 conv 1024 3 x 3 / 1 26 x 26 x 512 -> 26 x 26 x1024
21 conv 512 1 x 1 / 1 26 x 26 x1024 -> 26 x 26 x 512
22 conv 1024 3 x 3 / 1 26 x 26 x 512 -> 26 x 26 x1024
23 conv 1024 3 x 3 / 1 26 x 26 x1024 -> 26 x 26 x1024
24 conv 1024 3 x 3 / 1 26 x 26 x1024 -> 26 x 26 x1024
25 route 16
26 reorg / 2 52 x 52 x 512 -> 26 x 26 x2048
27 route 26 24
28 conv 1024 3 x 3 / 1 26 x 26 x3072 -> 26 x 26 x1024
29 conv 30 1 x 1 / 1 26 x 26 x1024 -> 26 x 26 x 30
30 detection
Loading weights from conv.23...
seen 32
Done!
Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005
Loaded: 0.272449 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413722, Avg Recall: -nan, count: 0
1: 36.093288, 36.093288 avg, 0.000100 rate, 0.412547 seconds, 1 images
Loaded: 0.078550 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.256573, Avg Recall: -nan, count: 0
2: 12.477231, 33.731682 avg, 0.000100 rate, 0.355188 seconds, 2 images
Loaded: 0.092865 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.102081, Avg Recall: -nan, count: 0
3: 0.823272, 30.440842 avg, 0.000100 rate, 0.318786 seconds, 3 images
Loaded: 0.111760 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.038852, Avg Recall: -nan, count: 0
4: 0.192174, 27.415976 avg, 0.000100 rate, 0.281865 seconds, 4 images
![b14](https://user-images.githubusercontent.com/1417993/37553093-78b4e88c-2986-11e8-9e0a-9606b99c7c1d.jpeg)
For pjreddie's darknet , I can give you one suggestion
In image.c, try to change line 258
int width = im.h * .006;
to
int width = fmax(1, im.h * .006);
when object's height is very small this parameter becomes very very small, so try my suggestion or make another combination and see any improvement.
OR
clone AlexeyAB's darknet and use small_object=1 in region layer of your cfg file , this parameter is used when object's size is 1% by 1% of the image's resolution, which happens to be in your case
hi @ahsan856jalal
It turned out I did fork AlexeyAB's darknet and added small_object = 1 in region. But the output is the same. I am thinking whether if it's the annotation is wrong or the cfg
file I pick is not suitable for this case, may I ask for your opinion and what direction to try? Thank you.
Also for batch
and subdivisions
can I sent them both to 1 because I have very limited data? Kind of wnated to get this working first before start augmenting the data, thanks again!
make batch =64 and subdivision to either 8,16 or 32 based on your GPU's loading power. But batch of 1 in training is not good as it is not generalizing while learning parameters. For test, you can batch and subdivision to 1 . Moreover, if you can share 1-2 images and its text file , I can check whether problem is within annotation or not.
Thank you so much @ahsan856jalal
Image 1
0 0.415865384615 0.605769230769 0.115384615385 0.115384615385
Image 2
0 0.554086538462 0.506610576923 0.112980769231 0.114182692308
For image 2 I am not sure which part to label actually, for this patch I annotate a square that covers all the black mark
@leyuan It would probably be easier to train yolo to detect individual black marks, by marking every black mark with its own square. By several sizes of black marks, it can be harder for yolo to generalise the datatype to detect all types (grouped marks, scattered marks etc.) by marking every black mark, it can generalise the mark itself.
A way to do post-processing if you just want one big box, is to the the "top-left box's top-left"-coordinates and "bottom-right box's bottom-right"-coordinates and then you have a big square defining the entire region with marks.
How many images do you have of marks? It is in general a good idea to have 2000 images per class in your dataset. If you can't achieve that you might be able to do some data-augmentation to get a bigger dataset.
Hey, @TheMikeyR thank you for your suggestion! Is there any annotation tool you would recommend?
I've used my own for some time (I'm annotating videos, so there are correlation between each frame for me). I've heard many people like https://github.com/AlexeyAB/Yolo_mark never tried it though.
Hi all,
I am training YOLOv2 on the following images
The part I want the bounding box to return is the black/marked area... as you can see it is pretty small. I have annotated the images.
However, after start training, I am getting
NaN
for the results start with line 0, as many people suggested it could be the annotation go wrong, I used the script provided here and the labels look like this0 0.421875 0.606796116505 0.114583333333 0.0582524271845
I am not sure if the numbers above ^^ are too small, images size is corresponding to the ones specified in the
cfg
fileAnd here is my
.cfg
file which is merely a copy ofyolo-voc2.0.cfg
Appreciate for your help!