Training YOLOv3 with own dataset

sonalambwani commented 6 years ago

Hi everyone, Has anyone had success with training YOLOv3 for their own datasets? If so, could you help sort out some questions for me:

For me, I have a 5 class object detection problem. In the .cfg file, I have changed the number of classes and the number of filters to 3*(num_classes+5) = 30 in 3 different places. I can initiate the training but the loss blows up to start with and I am seeing a bunch of nans in the output massage (see snippet)

Here are my questions:

Did you need to change the anchor box sizes and/or the number of anchors?
Did you need to create the labels differently than for YOLO v2?

Thanks!

ndg123 commented 6 years ago

No you don't need to change your training set. You need to calculate your anchors as previously on yolo2 but multiply by 32 (and round). Then split the anchors among the layers. If you have 9 anchors you can split them 3 ways, but decide based on size. Each anchor should have 5+number of objects filters. I got ok results with the default anchors but you could recompute. Remember your anchor calc should be the same scale as the input size for the network.

AlexeyAB commented 6 years ago

@sonalambwani Just wait about 1000 iterations, and nan will disappear: https://github.com/AlexeyAB/darknet/issues/504#issuecomment-377290060

You can re-calculate anchors, but it is not necessary. You can calculate anchors for Yolo v3 using this fork: https://github.com/AlexeyAB/darknet and this command if in your cfg-file width=416 and height=416: darknet.exe detector calc_anchors data/voc.data -num_of_clusters 9 -width 416 -heigh 416

This anchors you can use in your cfg-file (without multiplication by 32)

You can use the same labels as for Yolo v2

ss199302 commented 6 years ago

@AlexeyAB ,hello,but after waiting about 1000 iterations, and nan still appear:

satya2550 commented 6 years ago

Hi, I am trying to do Training YOLO on VOC.

below command i ma using , ./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

But nans keep on increasing. Is it normal or some issue. Loaded: 0.000063 seconds Region 82 Avg IOU: nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0 3296: -nan, nan avg, 0.001000 rate, 0.416401 seconds, 3296 images

springkim commented 6 years ago

I have a same issue. The error is shown in second yolo layer. Did you solve this problem?

DSpringQ commented 6 years ago

same tooooo

AlexeyAB commented 6 years ago

@ss199302 If there are only some nan then training goes well, but if there are all nan then training goes wrong.

sonalambwani commented 6 years ago

@AlexeyAB As you suggested, I am now training with my new dataset with the default COCO anchor boxes. I am training from "Scratch", i.e., no initialization with the pretrained convolutional weights as you have done in https://github.com/AlexeyAB/darknet/issues/504#issuecomment-377290060

For me, I see nans even after 2500 iterations. The loss (after starting off really high) has dropped within a reasonable range, but there is more of a fluctuation between the loss for each mini-batch.

Have you, or anyone else here, noticed similar behavior?

AlexeyAB commented 6 years ago

For me, I see nans even after 2500 iterations.

All lines have nan values or only some lines?
How many classes and images in your dataset? And what tool did you use for labeling?
What batch and subdivision do you use?
Do you use random=1?
Do you train using multi-GPU?

sonalambwani commented 6 years ago

It's just a few lines with nans.

Used an in-house tool for labeling.

batch=16, subdivisions = 16

Not sure about random=1. Where do I check/set that??

It's a single GPU.

sonalambwani commented 6 years ago

@AlexeyAB "How many classes and images in your dataset? And what tool did you use for labeling?"

5 classes, ~17k images in the training set.

AlexeyAB commented 6 years ago

@sonalambwani Looks like normal output of training.

ndg123 commented 6 years ago

You have batch and subdivision 16. That means one image per iteration and depending on the density of objects in your images, it's possible that no object will be found in a given layer which will lead to nan. Also depends if the ground truths are similar to the anchors. If they are all very small for all very large then you may not detect them in the very large or very small layers.

So I agree with alexeyAB that this looks normal. Can you reduce the subdivisions so more images per mini batch.

UgolUgol commented 6 years ago

I have a same issue default batch=64, subdivisions = 8 I have followed this instruction I really didn't understand if I should change anchors in yolo-obj.cfg when i have own dataset.

sonalambwani commented 6 years ago

@ndg123 Thank you for your suggestions. I am now testing with batch = 64 and subdiv=16. Right off the bat, I see fewer nans. There are a few, but it's looking better.

lynnw123 commented 6 years ago

per my training on customer dataset, If not all of them are nans, it is fine. since 3 different scales, that means in some scale, no object is detected. you may try different input image size, or divide into 2, or 4 different scales instead of 3? then the number of nans should change.

ss199302 commented 6 years ago

@AlexeyAB thanks your reply,but i can't test anything

ss199302 commented 6 years ago

@AlexeyAB darknet.exe detector calc_anchors data/voc.data -num_of_clusters 9 -width 416 -heigh 416,this command how to write in ubuntu darknet?

TheMikeyR commented 6 years ago

@ss199302 ./darknet detector calc_anchors data/voc.data -num_of_clusters 9 -width 416 -heigh 416

brieh commented 6 years ago

I am trying to run calc_anchors in linux using what @TheMikeyR says and it returns to command line immediately and gives no output. Is it supposed to print the anchors to std out? I'm new to C. Where can I find the code this command runs?

Also, I'm training on my own data, and the bounding boxes in my training data are all the exact same size, and they are all squares. Do I still need to specifiy more than one anchor?

AbhishekAshokDubey commented 6 years ago

Is it possible to detect the Signature (or any handwritten area) in printed receipts using YOLO ? which would be the best cfg file for the same, and any suggestions before I start ?

TheMikeyR commented 6 years ago

@brieh try Alexey repo https://github.com/AlexeyAB/darknet Here is the code https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L839

brieh commented 6 years ago

@TheMikeyR Thanks. I was using the pjreddie fork.

ss199302 commented 6 years ago

@AlexeyAB hello! I use this command ./darknet detector calc_anchors data/voc.data -num_of_clusters 9 -width 416 -heigh 416 to get anchors,but it don't return anything.

ntudy commented 6 years ago

@ss199302 same for me. Have u found the solution?

sonalambwani commented 6 years ago

@ss199302 @spenceryue97 did you create the labels (*.txt) files first?

brieh commented 6 years ago

@ss199302 @spenceryue97 and you're definitely using AlexeyAB's fork?

I never ended up getting it working. I didn't want to switch to AlexeyAB's fork because we've modified our fork of pjreddie's fork. I tried copy/pasting the code that does the clustering from AlexeyAB's detector.c to the one I have and remaking, but still gave no output.

ntudy commented 6 years ago

@sonalambwani Yes

ntudy commented 6 years ago

@brieh I'm using pjreddie's repo

TheMikeyR commented 6 years ago

@spenceryue97 @brieh you can just get AlexeyAB's fork, run the calc_anchors and then take the numbers to your cfg in pjreddie's repo.

ss199302 commented 6 years ago

@AlexeyAB Can you tell me when i use recall ，why my IOU appear nan，and recall and precision is 0.5% ，thanks！

anguoyang commented 6 years ago

@UgolUgol Have you compare the result between default anchors and the anchors calculated with the command from https://github.com/AlexeyAB/darknet?

jfries289 commented 6 years ago

@AlexeyAB I am training my own objects, and am weirdly getting valudes for all Region 106 results and -nan for everything else:

Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001404, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000535, .5R: -nan, .75R: -nan,  count: 0
Region 106 Avg IOU: 0.183575, Class: 0.167765, Obj: 0.002698, No Obj: 0.000716, .5R: 0.000000, .75R: 0.000000,  count: 1
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001364, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000517, .5R: -nan, .75R: -nan,  count: 0
Region 106 Avg IOU: 0.112761, Class: 0.219895, Obj: 0.001320, No Obj: 0.000692, .5R: 0.000000, .75R: 0.000000,  count: 4
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001196, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000538, .5R: -nan, .75R: -nan,  count: 0
Region 106 Avg IOU: 0.518243, Class: 0.616705, Obj: 0.000801, No Obj: 0.000739, .5R: 1.000000, .75R: 0.000000,  count: 1
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001336, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000534, .5R: -nan, .75R: -nan,  count: 0
Region 106 Avg IOU: 0.067241, Class: 0.113757, Obj: 0.002734, No Obj: 0.000756, .5R: 0.000000, .75R: 0.000000,  count: 7
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001470, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000540, .5R: -nan, .75R: -nan,  count: 0
Region 106 Avg IOU: 0.064037, Class: 0.159617, Obj: 0.005763, No Obj: 0.000764, .5R: 0.000000, .75R: 0.000000,  count: 5
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001454, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000550, .5R: -nan, .75R: -nan,  count: 0
Region 106 Avg IOU: 0.092829, Class: 0.161946, Obj: 0.004937, No Obj: 0.000723, .5R: 0.000000, .75R: 0.000000,  count: 8

 813: 6.122332, 6.256896 avg, 0.000437 rate, 8.432507 seconds, 26016 images

It's the consistency that's worrying me. I've got 16 classes with around 4500 images. The one particularly odd thing about my setup is that I've set the height and width for every identified object to 0.01 (e.g. 2 0.808552 0.933797 0.01 0.01), as I only care about the position, not the bounds of the object. Hopefully that's not messing things up?

AlexeyAB commented 6 years ago

@jfries289

every identified object to 0.01 so you will get nan for Region 82 and 94 always, but it isn't a problem. Training goes well.

But for slightly better accuracy, even if you need only position, it's better to set the real width and height of the objects, so Yolo will know which of 3 [yolo] layers (with higher receptive filed, or with higher resolution without subdiscretization) should be used to detect this object.

jfries289 commented 6 years ago

@AlexeyAB Thanks for the reply. It's clear I need to improve my understanding of the regions, etc. However, I'm still not sure my training is going successfully:

2612: 3.408263, 1.774134 avg, 0.001000 rate, 12.983424 seconds, 83584 images

My avg seems to be oscillating between 1.2 and 1.7. At this stage, I would have expected my avg to be lower. Is this the system temporarily stuck in a local minimum, or has something possibly gone wrong?

AlexeyAB commented 6 years ago

@jfries289

My avg seems to be oscillating between 1.2 and 1.7. At this stage, I would have expected my avg to be lower. Is this the system temporarily stuck in a local minimum, or has something possibly gone wrong?

I think this is because Yolo can't select the optimal [yolo]-layer (1 of 3), so the last [yolo]-layer predicts objects with the big error and it increases loss, also the difference between size that predicted by Yolo during training and size that you set is very large. Also may be something wrong else. I think you will able to detect objects, but with low accuracy.

I recommend you to set real sizes for object using Yolo_mark, then recalculate anchors and then start training from the begining.

In the Yolo v3, the labels with correct sizes of objects help to choose the optimal [yolo]-layer, i.e. help to train with higher accuracy.

jfries289 commented 6 years ago

@AlexeyAB

In the Yolo v3, the labels with correct sizes of objects help to choose the optimal [yolo]-layer, i.e. help to train with higher accuracy.

If full-sized labels are not an option, would it be better for me to use Yolo v2? Or would I have the same issue there?

AlexeyAB commented 6 years ago

@jfries289

What is the range of the real sizes of objects in your dataset?

jfries289 commented 6 years ago

@AlexeyAB I would guess anywhere from 0.1 to 0.8.

AlexeyAB commented 6 years ago

@jfries289

In the Yolo v3, the labels with correct sizes of objects help to choose the optimal [yolo]-layer, i.e. help to train with higher accuracy.

If full-sized labels are not an option, would it be better for me to use Yolo v2? Or would I have the same issue there?

If real sizes of objects are big - then probably will be better to use Yolo v2.
If real sizes of objects can be big and small - then you should re-label your objects and train with Yolo v3.

But I have never tested training using such dataset as your with the constant values of width and height.

pallpb commented 6 years ago

I have a dataset of 21k face images. I already checked the labelled data using yolo_mark. I am using yolov3 with batch 64 and subdivisions 16 I am getting nan every where shall I wait for 1000 iterations? here is the output: 73: -nan, -nan avg loss, 0.000000 rate, 649.007621 seconds, 4672 images Loaded: 0.000000 seconds Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 4 Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1 Region 106 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 1 Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 2 Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 4 Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0 Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 3 Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1 Region 106 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 4 Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 4 Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2 Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0 Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 3 Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R:0.000000, count: 3 Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0 Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 3 Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 3 Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0 Region 82 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0 Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 6 Region 106 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 2 Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 5 Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1 Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0

patilameya825 commented 6 years ago

@AlexeyAB As you mentioned earlier in Only if nan occurs for avg loss for several dozen consecutive iterations, then training went wrong. Otherwise, the training goes well., can you please give some solutions on how to correct the training process if we are getting all nans? My training loss goes on increasing and after some steps, all values become -nan.

guantinglin commented 6 years ago

I encounter this phenomenon on my 3 classes dataset, but after training, it goes well. I think it is from the scale mismatch of different output layer.

pushkalkatara commented 6 years ago

When no object found in the given layer it gives nan. IOU is basically Area of intersection / Area of Union. I think that looks normal.

MizbaMohammed commented 6 years ago

@AlexeyAB how many images do you think I should get if I want to add a new class to the COCO dataset?

gustavovaliati commented 6 years ago

Hello @MizbaMohammed ,

Are you already aware of the default recommendation? I guess this is not very specific for COCO, but in general: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more

RubyLiao commented 6 years ago

I got confused when set up the anchor boxes: how should I arrange the sequence of the clustered anchor boxes? You know, they're not distributed well as we wish...... I might get [1, 1, 2, 2, 5, 6, 30, 32, 42] instead of [1,2,3,4,5,6,7,8,9], and I hesitated to just put them evenly at 3 scales in yolov3. And experiments of myself have just proved that the arrangement of anchor boxes just matters. And the output Region 82 and Region 94, and Region 106 is another confusion: what do they mean?

Region 82 Avg IOU: 0.790874, Class: 0.993619, Obj: 0.970194, No Obj: 0.002241, .5R: 1.000000, .75R: 0.666667,  count: 3
Region 94 Avg IOU: 0.665403, Class: 0.775035, Obj: 0.567849, No Obj: 0.000524, .5R: 0.800000, .75R: 0.200000,  count: 5
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000002, .5R: -nan, .75R: -nan,  count: 0

How can it know how many objects the batch has at each layer? And if it decides each object has a layer
to belong to, then would the distribution of anchor boxes be a big problem? Could anyone help with this? Thanks a lot.

andyrey commented 6 years ago

I have successfully trained 1-object detection YOLO-2 model, but still doesn't understand- what role anchors in cfg file plays? I has changed them, but didn't see any effect. 1) What is the meaning of anchors? 2) @AlexeyAB: Alexey, what do you mean "real sizes of objects"? 3) If anybody has some program choosing one best weight from the trained set of weights, based on test annotated images? 4) What is the advantage of YOLO-3 over YOLO-2 ?

anguoyang commented 6 years ago

Hi@AlexeyAB , yolo v3-spp sounds good, is there any tutorial on how to train it? thank you

R1234A commented 6 years ago

When I am trying to calculate the anchors k-means++ can't be used without OpenCV, because there is used cvKMeans2 implementation , this error is coming. How to resolve this??

pjreddie / darknet

Training YOLOv3 with own dataset #597