How small can the bounding box be? NaN issue...

leyuan commented 6 years ago

Hi all,

I am training YOLOv2 on the following images b42

The part I want the bounding box to return is the black/marked area... as you can see it is pretty small. I have annotated the images.

However, after start training, I am getting NaN for the results start with line 0, as many people suggested it could be the annotation go wrong, I used the script provided here and the labels look like this

0 0.421875 0.606796116505 0.114583333333 0.0582524271845

I am not sure if the numbers above ^^ are too small, images size is corresponding to the ones specified in the cfg file

(py27) linux:~/Desktop/ws/defection-detection/darknet$ ./darknet detector train bottle-cfg/obj.data bottle-cfg/yolo-obj.cfg conv.23 
yolo-obj
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   288 x 416 x   3   ->   288 x 416 x  32
    1 max          2 x 2 / 2   288 x 416 x  32   ->   144 x 208 x  32
    2 conv     64  3 x 3 / 1   144 x 208 x  32   ->   144 x 208 x  64
    3 max          2 x 2 / 2   144 x 208 x  64   ->    72 x 104 x  64
    4 conv    128  3 x 3 / 1    72 x 104 x  64   ->    72 x 104 x 128
    5 conv     64  1 x 1 / 1    72 x 104 x 128   ->    72 x 104 x  64
    6 conv    128  3 x 3 / 1    72 x 104 x  64   ->    72 x 104 x 128
    7 max          2 x 2 / 2    72 x 104 x 128   ->    36 x  52 x 128
    8 conv    256  3 x 3 / 1    36 x  52 x 128   ->    36 x  52 x 256
    9 conv    128  1 x 1 / 1    36 x  52 x 256   ->    36 x  52 x 128
   10 conv    256  3 x 3 / 1    36 x  52 x 128   ->    36 x  52 x 256
   11 max          2 x 2 / 2    36 x  52 x 256   ->    18 x  26 x 256
   12 conv    512  3 x 3 / 1    18 x  26 x 256   ->    18 x  26 x 512
   13 conv    256  1 x 1 / 1    18 x  26 x 512   ->    18 x  26 x 256
   14 conv    512  3 x 3 / 1    18 x  26 x 256   ->    18 x  26 x 512
   15 conv    256  1 x 1 / 1    18 x  26 x 512   ->    18 x  26 x 256
   16 conv    512  3 x 3 / 1    18 x  26 x 256   ->    18 x  26 x 512
   17 max          2 x 2 / 2    18 x  26 x 512   ->     9 x  13 x 512
   18 conv   1024  3 x 3 / 1     9 x  13 x 512   ->     9 x  13 x1024
   19 conv    512  1 x 1 / 1     9 x  13 x1024   ->     9 x  13 x 512
   20 conv   1024  3 x 3 / 1     9 x  13 x 512   ->     9 x  13 x1024
   21 conv    512  1 x 1 / 1     9 x  13 x1024   ->     9 x  13 x 512
   22 conv   1024  3 x 3 / 1     9 x  13 x 512   ->     9 x  13 x1024
   23 conv   1024  3 x 3 / 1     9 x  13 x1024   ->     9 x  13 x1024
   24 conv   1024  3 x 3 / 1     9 x  13 x1024   ->     9 x  13 x1024
   25 route  16
   26 reorg              / 2    18 x  26 x 512   ->     9 x  13 x2048
   27 route  26 24
   28 conv   1024  3 x 3 / 1     9 x  13 x3072   ->     9 x  13 x1024
   29 conv     30  1 x 1 / 1     9 x  13 x1024   ->     9 x  13 x  30
   30 detection
Loading weights from conv.23...
 seen 32 
Done!
Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005
Loaded: 2.960990 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.492677, Avg Recall: -nan,  count: 0

And here is my .cfg file which is merely a copy of yolo-voc2.0.cfg

[net]
batch=64
subdivisions=8
height=416
width=288
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0001
max_batches = 45000
policy=steps
steps=100,25000,35000
scales=10,.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

#######

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[route]
layers=-9

[reorg]
stride=2

[route]
layers=-1,-3

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
anchors = 1.08,1.19,  3.42,4.41,  6.63,11.38,  9.42,5.11,  16.62,10.52
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=0

Appreciate for your help!

ahsan856jalal commented 6 years ago

try to increase the height and width in your cfg file , try with both equal to 832 and see if improves

Regards Ahsan

On Wed, Mar 14, 2018 at 11:37 AM, Leyuan notifications@github.com wrote:

Hi all,

I am training YOLOv2 on the following images [image: b42] https://user-images.githubusercontent.com/1417993/37386732-4636a94a-271f-11e8-9afb-00160856ae7e.jpeg

The part I want the bounding box to return is the black/marked area... as you can see it is pretty small. I have annotated the images.

However, after start training, I am getting NaN for the results start with line 0, as many people suggested it could be the annotation go wrong, I used the script provided here https://github.com/Guanghan/darknet/blob/master/scripts/convert.py and the labels look like this

0 0.421875 0.606796116505 0.114583333333 0.0582524271845 <+58%20252-4271845>

I am not sure if the numbers above ^^ are too small

(py27) linux:~/Desktop/ws/defection-detection/darknet$ ./darknet detector train bottle-cfg/obj.data bottle-cfg/yolo-obj.cfg conv.23 yolo-obj layer filters size input output 0 conv 32 3 x 3 / 1 288 x 416 x 3 -> 288 x 416 x 32 1 max 2 x 2 / 2 288 x 416 x 32 -> 144 x 208 x 32 2 conv 64 3 x 3 / 1 144 x 208 x 32 -> 144 x 208 x 64 3 max 2 x 2 / 2 144 x 208 x 64 -> 72 x 104 x 64 4 conv 128 3 x 3 / 1 72 x 104 x 64 -> 72 x 104 x 128 5 conv 64 1 x 1 / 1 72 x 104 x 128 -> 72 x 104 x 64 6 conv 128 3 x 3 / 1 72 x 104 x 64 -> 72 x 104 x 128 7 max 2 x 2 / 2 72 x 104 x 128 -> 36 x 52 x 128 8 conv 256 3 x 3 / 1 36 x 52 x 128 -> 36 x 52 x 256 9 conv 128 1 x 1 / 1 36 x 52 x 256 -> 36 x 52 x 128 10 conv 256 3 x 3 / 1 36 x 52 x 128 -> 36 x 52 x 256 11 max 2 x 2 / 2 36 x 52 x 256 -> 18 x 26 x 256 12 conv 512 3 x 3 / 1 18 x 26 x 256 -> 18 x 26 x 512 13 conv 256 1 x 1 / 1 18 x 26 x 512 -> 18 x 26 x 256 14 conv 512 3 x 3 / 1 18 x 26 x 256 -> 18 x 26 x 512 15 conv 256 1 x 1 / 1 18 x 26 x 512 -> 18 x 26 x 256 16 conv 512 3 x 3 / 1 18 x 26 x 256 -> 18 x 26 x 512 17 max 2 x 2 / 2 18 x 26 x 512 -> 9 x 13 x 512 18 conv 1024 3 x 3 / 1 9 x 13 x 512 -> 9 x 13 x1024 19 conv 512 1 x 1 / 1 9 x 13 x1024 -> 9 x 13 x 512 20 conv 1024 3 x 3 / 1 9 x 13 x 512 -> 9 x 13 x1024 21 conv 512 1 x 1 / 1 9 x 13 x1024 -> 9 x 13 x 512 22 conv 1024 3 x 3 / 1 9 x 13 x 512 -> 9 x 13 x1024 23 conv 1024 3 x 3 / 1 9 x 13 x1024 -> 9 x 13 x1024 24 conv 1024 3 x 3 / 1 9 x 13 x1024 -> 9 x 13 x1024 25 route 16 26 reorg / 2 18 x 26 x 512 -> 9 x 13 x2048 27 route 26 24 28 conv 1024 3 x 3 / 1 9 x 13 x3072 -> 9 x 13 x1024 29 conv 30 1 x 1 / 1 9 x 13 x1024 -> 9 x 13 x 30 30 detection Loading weights from conv.23... seen 32 Done! Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005 Loaded: 2.960990 seconds Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.492677, Avg Recall: -nan, count: 0

And here is my .cfg file which is merely a copy of yolo-voc2.0.cfg

[net] batch=64 subdivisions=8 height=416 width=288 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.0001 max_batches = 45000 policy=steps steps=100,25000,35000 scales=10,.1,.1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

#######

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[route] layers=-9

[reorg] stride=2

[route] layers=-1,-3

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=30 activation=linear

[region] anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 bias_match=1 classes=1 coords=4 num=5 softmax=1 jitter=.2 rescore=1

object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1

absolute=1 thresh = .6 random=0

Appreciate for your help!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pjreddie/darknet/issues/531, or mute the thread https://github.com/notifications/unsubscribe-auth/AK9zjtD4Z1s7NEArpfhKGYCGOS-A0N1Vks5teLq7gaJpZM4Sp6Fx .

leyuan commented 6 years ago

~~Thanks @ahsan856jalal, for the past few days I was having trouble to make my annotation tool to work with 832 images, any tool you would recommend?~~

~~The one I am currently using is~~

~~https://github.com/puzzledqs/BBox-Label-Tool~~

Never mind, got the label working

leyuan commented 6 years ago

@ahsan856jalal

I have tried larger images 832*832 like this one

And its annotation 0 0.579927884615 0.575120192308 0.126201923077 0.123798076923

But still got NaN, sincerely thank you for your help

(py27) yul@linux:~/Desktop/ws/defection-detection/darknet$ ./darknet detector train bottle-cfg/obj.data bottle-cfg/yolo-obj.cfg conv.23 
yolo-obj
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   832 x 832 x   3   ->   832 x 832 x  32
    1 max          2 x 2 / 2   832 x 832 x  32   ->   416 x 416 x  32
    2 conv     64  3 x 3 / 1   416 x 416 x  32   ->   416 x 416 x  64
    3 max          2 x 2 / 2   416 x 416 x  64   ->   208 x 208 x  64
    4 conv    128  3 x 3 / 1   208 x 208 x  64   ->   208 x 208 x 128
    5 conv     64  1 x 1 / 1   208 x 208 x 128   ->   208 x 208 x  64
    6 conv    128  3 x 3 / 1   208 x 208 x  64   ->   208 x 208 x 128
    7 max          2 x 2 / 2   208 x 208 x 128   ->   104 x 104 x 128
    8 conv    256  3 x 3 / 1   104 x 104 x 128   ->   104 x 104 x 256
    9 conv    128  1 x 1 / 1   104 x 104 x 256   ->   104 x 104 x 128
   10 conv    256  3 x 3 / 1   104 x 104 x 128   ->   104 x 104 x 256
   11 max          2 x 2 / 2   104 x 104 x 256   ->    52 x  52 x 256
   12 conv    512  3 x 3 / 1    52 x  52 x 256   ->    52 x  52 x 512
   13 conv    256  1 x 1 / 1    52 x  52 x 512   ->    52 x  52 x 256
   14 conv    512  3 x 3 / 1    52 x  52 x 256   ->    52 x  52 x 512
   15 conv    256  1 x 1 / 1    52 x  52 x 512   ->    52 x  52 x 256
   16 conv    512  3 x 3 / 1    52 x  52 x 256   ->    52 x  52 x 512
   17 max          2 x 2 / 2    52 x  52 x 512   ->    26 x  26 x 512
   18 conv   1024  3 x 3 / 1    26 x  26 x 512   ->    26 x  26 x1024
   19 conv    512  1 x 1 / 1    26 x  26 x1024   ->    26 x  26 x 512
   20 conv   1024  3 x 3 / 1    26 x  26 x 512   ->    26 x  26 x1024
   21 conv    512  1 x 1 / 1    26 x  26 x1024   ->    26 x  26 x 512
   22 conv   1024  3 x 3 / 1    26 x  26 x 512   ->    26 x  26 x1024
   23 conv   1024  3 x 3 / 1    26 x  26 x1024   ->    26 x  26 x1024
   24 conv   1024  3 x 3 / 1    26 x  26 x1024   ->    26 x  26 x1024
   25 route  16
   26 reorg              / 2    52 x  52 x 512   ->    26 x  26 x2048
   27 route  26 24
   28 conv   1024  3 x 3 / 1    26 x  26 x3072   ->    26 x  26 x1024
   29 conv     30  1 x 1 / 1    26 x  26 x1024   ->    26 x  26 x  30
   30 detection
Loading weights from conv.23...
 seen 32 
Done!
Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005
Loaded: 0.272449 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413722, Avg Recall: -nan,  count: 0
1: 36.093288, 36.093288 avg, 0.000100 rate, 0.412547 seconds, 1 images
Loaded: 0.078550 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.256573, Avg Recall: -nan,  count: 0
2: 12.477231, 33.731682 avg, 0.000100 rate, 0.355188 seconds, 2 images
Loaded: 0.092865 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.102081, Avg Recall: -nan,  count: 0
3: 0.823272, 30.440842 avg, 0.000100 rate, 0.318786 seconds, 3 images
Loaded: 0.111760 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.038852, Avg Recall: -nan,  count: 0
4: 0.192174, 27.415976 avg, 0.000100 rate, 0.281865 seconds, 4 images
![b14](https://user-images.githubusercontent.com/1417993/37553093-78b4e88c-2986-11e8-9e0a-9606b99c7c1d.jpeg)

ahsan856jalal commented 6 years ago

For pjreddie's darknet , I can give you one suggestion In image.c, try to change line 258 int width = im.h * .006; to int width = fmax(1, im.h * .006); when object's height is very small this parameter becomes very very small, so try my suggestion or make another combination and see any improvement. OR clone AlexeyAB's darknet and use small_object=1 in region layer of your cfg file , this parameter is used when object's size is 1% by 1% of the image's resolution, which happens to be in your case

leyuan commented 6 years ago

hi @ahsan856jalal

It turned out I did fork AlexeyAB's darknet and added small_object = 1 in region. But the output is the same. I am thinking whether if it's the annotation is wrong or the cfg file I pick is not suitable for this case, may I ask for your opinion and what direction to try? Thank you.

Also for batch and subdivisions can I sent them both to 1 because I have very limited data? Kind of wnated to get this working first before start augmenting the data, thanks again!

ahsan856jalal commented 6 years ago

make batch =64 and subdivision to either 8,16 or 32 based on your GPU's loading power. But batch of 1 in training is not good as it is not generalizing while learning parameters. For test, you can batch and subdivision to 1 . Moreover, if you can share 1-2 images and its text file , I can check whether problem is within annotation or not.

leyuan commented 6 years ago

Thank you so much @ahsan856jalal

Image 1

0 0.415865384615 0.605769230769 0.115384615385 0.115384615385

Image 2

0 0.554086538462 0.506610576923 0.112980769231 0.114182692308

For image 2 I am not sure which part to label actually, for this patch I annotate a square that covers all the black mark

TheMikeyR commented 6 years ago

@leyuan It would probably be easier to train yolo to detect individual black marks, by marking every black mark with its own square. By several sizes of black marks, it can be harder for yolo to generalise the datatype to detect all types (grouped marks, scattered marks etc.) by marking every black mark, it can generalise the mark itself.

A way to do post-processing if you just want one big box, is to the the "top-left box's top-left"-coordinates and "bottom-right box's bottom-right"-coordinates and then you have a big square defining the entire region with marks.

How many images do you have of marks? It is in general a good idea to have 2000 images per class in your dataset. If you can't achieve that you might be able to do some data-augmentation to get a bigger dataset.

leyuan commented 6 years ago

Hey, @TheMikeyR thank you for your suggestion! Is there any annotation tool you would recommend?

TheMikeyR commented 6 years ago

I've used my own for some time (I'm annotating videos, so there are correlation between each frame for me). I've heard many people like https://github.com/AlexeyAB/Yolo_mark never tried it though.

pjreddie / darknet

How small can the bounding box be? NaN issue... #531