pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.79k stars 21.33k forks source link

Yolov3 TINY - net never converges #1391

Closed Fishelstix closed 5 years ago

Fishelstix commented 5 years ago

I'm having trouble getting yolov3 tiny to converge on a data set of traffic sign images and types. I'm sure that I've formatted my tiny cfg file correctly, setting filters = 3*(classes+5) in the layers predeceasing the yolo layers and what not, and I'm reasonably sure that I've formatted the street sign data with ground truth boxes correctly.

However, whenever I run the code, regardless of how many subdivisions I use or what anchors I use, the network repeatedly alternates between Region 16 and Region 23 and always reports that avg loss is something crazy in the 600s. This snippet from console is from only leaving the code on for a couple iterations, but I have left it on for hours before, and the code still only alternates between Regions 16 and 23 and the average loss only grows.

If anyone has encountered this problem before, or one similar to it, I would greatly appreciate some advice.

/Users/erana/darknet/darknet detector train data/yolo.data cfg/yolov3-tiny-GTSDB.cfg yolov3-tiny.conv.15 yolov3-tiny-GTSDB layer filters size input output 0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs 1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16 2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs 3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32 4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs 5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64 6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs 7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128 8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs 9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256 10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs 11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512 12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs 14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs 15 conv 144 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 144 0.025 BFLOPs 16 yolo 17 route 13 18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs 19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128 20 route 19 8 21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs 22 conv 144 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 144 0.050 BFLOPs 23 yolo Loading weights from yolov3-tiny.conv.15...Done! Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005 Resizing 480 Loaded: 0.929056 seconds Region 16 Avg IOU: 0.051731, Class: 0.396067, Obj: 0.542794, No Obj: 0.386129, .5R: 0.000000, .75R: 0.000000, count: 5 Region 23 Avg IOU: 0.118626, Class: 0.466361, Obj: 0.528537, No Obj: 0.586451, .5R: 0.000000, .75R: 0.000000, count: 7 Region 16 Avg IOU: 0.143758, Class: 0.453867, Obj: 0.495200, No Obj: 0.388145, .5R: 0.000000, .75R: 0.000000, count: 7 Region 23 Avg IOU: 0.139551, Class: 0.590291, Obj: 0.550832, No Obj: 0.587167, .5R: 0.000000, .75R: 0.000000, count: 9 Region 16 Avg IOU: 0.119892, Class: 0.575722, Obj: 0.278907, No Obj: 0.386017, .5R: 0.000000, .75R: 0.000000, count: 4 Region 23 Avg IOU: 0.156436, Class: 0.612939, Obj: 0.427837, No Obj: 0.588333, .5R: 0.142857, .75R: 0.000000, count: 7 Region 16 Avg IOU: 0.086728, Class: 0.386449, Obj: 0.536109, No Obj: 0.387869, .5R: 0.000000, .75R: 0.000000, count: 3 Region 23 Avg IOU: 0.096530, Class: 0.523988, Obj: 0.487201, No Obj: 0.589963, .5R: 0.000000, .75R: 0.000000, count: 12 Region 16 Avg IOU: 0.059097, Class: 0.485299, Obj: 0.340098, No Obj: 0.383763, .5R: 0.000000, .75R: 0.000000, count: 3 Region 23 Avg IOU: 0.143288, Class: 0.554795, Obj: 0.457131, No Obj: 0.588158, .5R: 0.000000, .75R: 0.000000, count: 13 Region 16 Avg IOU: 0.156728, Class: 0.399195, Obj: 0.421893, No Obj: 0.385046, .5R: 0.000000, .75R: 0.000000, count: 7 Region 23 Avg IOU: 0.033785, Class: 0.451098, Obj: 0.656794, No Obj: 0.588587, .5R: 0.000000, .75R: 0.000000, count: 6 Region 16 Avg IOU: 0.046540, Class: 0.219762, Obj: 0.528606, No Obj: 0.390582, .5R: 0.000000, .75R: 0.000000, count: 2 Region 23 Avg IOU: 0.073912, Class: 0.505028, Obj: 0.624005, No Obj: 0.588669, .5R: 0.000000, .75R: 0.000000, count: 10 Region 16 Avg IOU: 0.086740, Class: 0.566925, Obj: 0.759137, No Obj: 0.387308, .5R: 0.000000, .75R: 0.000000, count: 4 Region 23 Avg IOU: 0.085011, Class: 0.472009, Obj: 0.511277, No Obj: 0.585508, .5R: 0.000000, .75R: 0.000000, count: 7 1: 601.399841, 601.399841 avg, 0.000000 rate, 235.437798 seconds, 64 images Loaded: 0.000054 seconds Region 16 Avg IOU: 0.157427, Class: 0.241370, Obj: 0.373469, No Obj: 0.389502, .5R: 0.000000, .75R: 0.000000, count: 2 Region 23 Avg IOU: 0.087665, Class: 0.429429, Obj: 0.439043, No Obj: 0.589691, .5R: 0.000000, .75R: 0.000000, count: 12 Region 16 Avg IOU: 0.116719, Class: 0.471211, Obj: 0.572456, No Obj: 0.385280, .5R: 0.000000, .75R: 0.000000, count: 4 Region 23 Avg IOU: 0.147601, Class: 0.530534, Obj: 0.458185, No Obj: 0.588418, .5R: 0.090909, .75R: 0.000000, count: 11 Region 16 Avg IOU: 0.016780, Class: 0.361724, Obj: 0.303770, No Obj: 0.384796, .5R: 0.000000, .75R: 0.000000, count: 2 Region 23 Avg IOU: 0.084532, Class: 0.450804, Obj: 0.481389, No Obj: 0.587668, .5R: 0.000000, .75R: 0.000000, count: 15 Region 16 Avg IOU: 0.117392, Class: 0.543041, Obj: 0.368827, No Obj: 0.387505, .5R: 0.100000, .75R: 0.000000, count: 10 Region 23 Avg IOU: 0.256294, Class: 0.652932, Obj: 0.683743, No Obj: 0.589204, .5R: 0.000000, .75R: 0.000000, count: 3 Region 16 Avg IOU: 0.086419, Class: 0.430098, Obj: 0.382061, No Obj: 0.385584, .5R: 0.000000, .75R: 0.000000, count: 7 Region 23 Avg IOU: 0.092971, Class: 0.601541, Obj: 0.684699, No Obj: 0.588462, .5R: 0.000000, .75R: 0.000000, count: 5 Region 16 Avg IOU: 0.061201, Class: 0.421523, Obj: 0.279327, No Obj: 0.388043, .5R: 0.000000, .75R: 0.000000, count: 3 Region 23 Avg IOU: 0.132614, Class: 0.447051, Obj: 0.330808, No Obj: 0.588045, .5R: 0.000000, .75R: 0.000000, count: 11 Region 16 Avg IOU: 0.149230, Class: 0.464150, Obj: 0.591987, No Obj: 0.385612, .5R: 0.000000, .75R: 0.000000, count: 4 Region 23 Avg IOU: 0.193538, Class: 0.304314, Obj: 0.550590, No Obj: 0.586988, .5R: 0.000000, .75R: 0.000000, count: 8 Region 16 Avg IOU: 0.088681, Class: 0.509298, Obj: 0.406772, No Obj: 0.388736, .5R: 0.000000, .75R: 0.000000, count: 6 Region 23 Avg IOU: 0.152170, Class: 0.537265, Obj: 0.555099, No Obj: 0.586446, .5R: 0.000000, .75R: 0.000000, count: 6 2: 603.512085, 601.611084 avg, 0.000000 rate, 235.894398 seconds, 128 images Loaded: 0.000051 seconds Region 16 Avg IOU: 0.054455, Class: 0.442953, Obj: 0.560762, No Obj: 0.383083, .5R: 0.000000, .75R: 0.000000, count: 8 Region 23 Avg IOU: 0.099459, Class: 0.373931, Obj: 0.736904, No Obj: 0.588418, .5R: 0.000000, .75R: 0.000000, count: 7 Region 16 Avg IOU: 0.121303, Class: 0.406264, Obj: 0.646978, No Obj: 0.381567, .5R: 0.000000, .75R: 0.000000, count: 7 Region 23 Avg IOU: 0.154501, Class: 0.682709, Obj: 0.553311, No Obj: 0.588480, .5R: 0.000000, .75R: 0.000000, count: 6 Region 16 Avg IOU: 0.071020, Class: 0.553534, Obj: 0.296933, No Obj: 0.385904, .5R: 0.000000, .75R: 0.000000, count: 5 Region 23 Avg IOU: 0.150764, Class: 0.632080, Obj: 0.528378, No Obj: 0.587232, .5R: 0.000000, .75R: 0.000000, count: 7 Region 16 Avg IOU: 0.179915, Class: 0.427501, Obj: 0.745778, No Obj: 0.388586, .5R: 0.000000, .75R: 0.000000, count: 2 Region 23 Avg IOU: 0.083835, Class: 0.404531, Obj: 0.634014, No Obj: 0.587944, .5R: 0.000000, .75R: 0.000000, count: 10 Region 16 Avg IOU: 0.074211, Class: 0.552378, Obj: 0.429077, No Obj: 0.387234, .5R: 0.000000, .75R: 0.000000, count: 5 Region 23 Avg IOU: 0.068203, Class: 0.596426, Obj: 0.568316, No Obj: 0.588830, .5R: 0.000000, .75R: 0.000000, count: 12 Region 16 Avg IOU: 0.052677, Class: 0.514116, Obj: 0.640721, No Obj: 0.382754, .5R: 0.000000, .75R: 0.000000, count: 7 Region 23 Avg IOU: 0.070586, Class: 0.410168, Obj: 0.608971, No Obj: 0.590506, .5R: 0.000000, .75R: 0.000000, count: 5

Fishelstix commented 5 years ago

Also, while running, the code never generates any weight files in the "backup" directory. I suspect this is caused by the fact that the net never finds weight that improve on the loss of the initial weights.

Fishelstix commented 5 years ago

I tried to train tiny yolo on the VOC PASCAL data set: same problem. Alternating between region 16/23, avg error of 400+, and the error only gets worse as the net diverges.

sivagnanamn commented 5 years ago

@Fishelstix How many iterations did you try training? Wait for around 1000 iterations & check.. weights will start saving only after 100 iterations..

hainan89 commented 5 years ago

Hello, all How do you feed the label files to the training process. According to the cfg file, only the train dataset (img file path) is given, but where is the point that regulates the labels data?

I follow the VOC training instructions, the labels files located in a individual label folder, but when I train my own dataset, the program give a notice that the labels files should be located under the same folder of the image files.

Do you have any comments? Thanks, I also think this is the point why the model do not have a convergence.