Can you give an example on using the training conf. of the following training strategy of Applying the two-stage training strategy or the one-stage training strategy?

liminghuiv commented 5 years ago

Hi,

Can you give an example on using the training conf. of the following training strategy? (1) Apply the two-stage training strategy or the one-stage training strategy:

Two-stage training:

First stage: Restore darknet53_body part weights from COCO checkpoints, train the yolov3_head with big learning rate like 1e-3 until the loss reaches to a low level.

Second stage: Restore the weights from the first stage, then train the whole model with small learning rate like 1e-4 or smaller. At this stage remember to restore the optimizer parameters if you use optimizers like adam.

One-stage training:

Just restore the whole weight file except the last three convolution layers (Conv_6, Conv_14, Conv_22). In this condition, be careful about the possible nan loss value.

Thanks and best regards,

Liming

liminghuiv commented 5 years ago

I run you sample VOC configuration, this is what I got:

======> Epoch: 99, global_step: 275899.0, lr: 0.0001 <====== EVAL: Class 0: Recall: 0.9509, Precision: 0.3687, AP: 0.9281 EVAL: Class 1: Recall: 0.9436, Precision: 0.4291, AP: 0.9059 EVAL: Class 2: Recall: 0.9129, Precision: 0.3833, AP: 0.8310 EVAL: Class 3: Recall: 0.8593, Precision: 0.2251, AP: 0.7126 EVAL: Class 4: Recall: 0.8955, Precision: 0.1733, AP: 0.7592 EVAL: Class 5: Recall: 0.9671, Precision: 0.3679, AP: 0.9396 EVAL: Class 6: Recall: 0.9692, Precision: 0.3767, AP: 0.9116 EVAL: Class 7: Recall: 0.9385, Precision: 0.4571, AP: 0.9120 EVAL: Class 8: Recall: 0.8889, Precision: 0.2113, AP: 0.7045 EVAL: Class 9: Recall: 0.9221, Precision: 0.3577, AP: 0.8469 EVAL: Class 10: Recall: 0.9417, Precision: 0.2172, AP: 0.7786 EVAL: Class 11: Recall: 0.9571, Precision: 0.4167, AP: 0.9119 EVAL: Class 12: Recall: 0.9310, Precision: 0.3608, AP: 0.9061 EVAL: Class 13: Recall: 0.9385, Precision: 0.3661, AP: 0.9073 EVAL: Class 14: Recall: 0.9302, Precision: 0.4981, AP: 0.8813 EVAL: Class 15: Recall: 0.7792, Precision: 0.1996, AP: 0.5394 EVAL: Class 16: Recall: 0.9215, Precision: 0.2217, AP: 0.8436 EVAL: Class 17: Recall: 0.9540, Precision: 0.2495, AP: 0.8292 EVAL: Class 18: Recall: 0.9362, Precision: 0.3900, AP: 0.8900 EVAL: Class 19: Recall: 0.8961, Precision: 0.2848, AP: 0.7887 EVAL: Recall: 0.9246, Precison: 0.3495, mAP: 0.8364 EVAL: loss: total: 4.76, xy: 0.33, wh: 0.16, conf: 3.39, class: 0.88 Is the above result reasonable? It is different from what you got: I got a 87.54% test mAP (not using the 07 metric).

wizyoung commented 5 years ago

Results on epoch 99 is kind of overfitting. I got 87.54% mAP after 36 epochs. Here is my training logs for your reference: training.log

liminghuiv commented 5 years ago

thanks for the quick reply. pls see the attachment for my train.py, args.py (in txt extension file) and progress.log. Did I do anything wrong? progress.log

args_voc.txt train_voc.txt

lovepan1 commented 5 years ago

======> Epoch: 8, global_step: 33209.0, lr: 0.0001 <====== EVAL: Class 0: Recall: 0.9164, Precision: 0.0949, AP: 0.8745 EVAL: Class 1: Recall: 0.8766, Precision: 0.0888, AP: 0.7734 EVAL: Class 2: Recall: 0.8802, Precision: 0.0969, AP: 0.7795 EVAL: Class 3: Recall: 0.8397, Precision: 0.0487, AP: 0.6078 EVAL: Class 4: Recall: 0.8143, Precision: 0.0601, AP: 0.6424 EVAL: Class 5: Recall: 0.9252, Precision: 0.0613, AP: 0.8753 EVAL: Class 6: Recall: 0.9448, Precision: 0.1212, AP: 0.8627 EVAL: Class 7: Recall: 0.9378, Precision: 0.1001, AP: 0.9063 EVAL: Class 8: Recall: 0.8443, Precision: 0.1052, AP: 0.6356 EVAL: Class 9: Recall: 0.9331, Precision: 0.0577, AP: 0.8472 EVAL: Class 10: Recall: 0.8763, Precision: 0.0441, AP: 0.6926 EVAL: Class 11: Recall: 0.9528, Precision: 0.1576, AP: 0.8723 EVAL: Class 12: Recall: 0.9468, Precision: 0.0981, AP: 0.8533 EVAL: Class 13: Recall: 0.9079, Precision: 0.0610, AP: 0.8196 EVAL: Class 14: Recall: 0.9283, Precision: 0.1743, AP: 0.8330 EVAL: Class 15: Recall: 0.8159, Precision: 0.0503, AP: 0.5252 EVAL: Class 16: Recall: 0.8907, Precision: 0.0462, AP: 0.7814 EVAL: Class 17: Recall: 0.9268, Precision: 0.1195, AP: 0.7236 EVAL: Class 18: Recall: 0.9570, Precision: 0.0689, AP: 0.8745 EVAL: Class 19: Recall: 0.9446, Precision: 0.0538, AP: 0.8228 EVAL: Recall: 0.9073, Precison: 0.0978, mAP: 0.7801 EVAL: loss: total: 6.42, xy: 0.54, wh: 0.34, conf: 4.12, class: 1.42 this is my best map, i also met this appearance.

lovepan1 commented 5 years ago

first stage: i restore yolov3 darknet weights and update yolov3_head, second stage: i update darknet53 and yolov3_head this is my train process

liminghuiv commented 5 years ago

Thanks. Can you share your two stages' arg.py files?

lovepan1 commented 5 years ago

this is my arg.py: first_stage: use darknet weights, restore darknet53, update yolov3_head second_stage: use first_stage trained weights , restored darkent53 and yolov3_head， update darkent53 and yolov3_head first_stage.txt second_stage.txt

liminghuiv commented 5 years ago

Hi, @wizyoung. Can you pls review the one stage and two stage args.py files? and give us some suggestions? Thanks.

wizyoung commented 5 years ago

@liminghuiv Are your training and test txt files correct? Here are my txt files: train.txt val.txt

I hope you make efforts to understand the yolo v3 model and its parameters, and finetune the model yourself.

liminghuiv commented 5 years ago

@wizyong, I used your misc/experiments_on_voc script and data. The training/Val text files are exactly the same. Thanks a lot.

liminghuiv commented 5 years ago

this is my arg.py: first_stage: use darknet weights, restore darknet53, update yolov3_head second_stage: use first_stage trained weights , restored darkent53 and yolov3_head， update darkent53 and yolov3_head first_stage.txt second_stage.txt

Hi @lovepan1, it seems that you did not use higher learning rate (i.e. 1e-3) at the first stage, and lower learning rate at the second stage (<1e-4) according to the readme?

lovepan1 commented 5 years ago

this is my arg.py: first_stage: use darknet weights, restore darknet53, update yolov3_head second_stage: use first_stage trained weights , restored darkent53 and yolov3_head， update darkent53 and yolov3_head first_stage.txt second_stage.txt

Hi @lovepan1, it seems that you did not use higher learning rate (i.e. 1e-3) at the first stage, and lower learning rate at the second stage (<1e-4) according to the readme?

ok, i will use the appropriate lr to train my model, thanks a lot

liminghuiv commented 5 years ago

@lovepan1 , hope it works. Can you update your running result when you finish the running?

lovepan1 commented 5 years ago

@lovepan1 , hope it works. Can you update your running result when you finish the running?

ok, in this weekend, i will retrain my model to use appropriate lr, hope it works, thank you.

zyc4me commented 4 years ago

@liminghuiv @wizyoung @lovepan1 hi guys， i have met the same problem， i use the /misc/experiments_on_voc/args_voc.py， and do not use two stage train, just use one stage, my log is similar with @liminghuiv , after several epochs, the training conf_loss and class_loss is very small ,like 0.02 0.35, bug is very different with @wizyoung 1.5x 2.2x..., so can you help me found the problem? @wizyoung

mew124 commented 4 years ago

@zyc4me Did you find a solution? I also use one stage training and met the same problem.

bujianyiwang commented 4 years ago

I want to use yolov3 to predict the number of real-time person from the rtsp stream with suited interval,anyone has the right python file?

aryuCoding commented 4 years ago

Results on epoch 99 is kind of overfitting. I got 87.54% mAP after 36 epochs. Here is my training logs for your reference: training.log

Mon, 01 Jul 2019 08:50:21 INFO Epoch: 50, global_step: 140400 | loss: total: 4.13, xy: 0.32, wh: 0.23, conf: 1.48, class: 2.10 | Last batch: rec: 0.937, prec: 0.007 | lr: 0.0001 Mon, 01 Jul 2019 08:50:56 INFO Epoch: 50, global_step: 140500 | loss: total: 4.12, xy: 0.32, wh: 0.23, conf: 1.48, class: 2.10 | Last batch: rec: 0.857, prec: 0.014 | lr: 0.0001 Mon, 01 Jul 2019 08:51:39 INFO Epoch: 50, global_step: 140600 | loss: total: 4.14, xy: 0.32, wh: 0.23, conf: 1.48, class: 2.11 | Last batch: rec: 0.786, prec: 0.014 | lr: 0.0001 Mon, 01 Jul 2019 08:52:30 INFO Epoch: 50, global_step: 140700 | loss: total: 4.15, xy: 0.32, wh: 0.23, conf: 1.49, class: 2.11 | Last batch: rec: 0.867, prec: 0.012 | lr: 0.0001 Mon, 01 Jul 2019 08:56:02 INFO ======> Epoch: 50, global_step: 140708.0, lr: 0.0001 <======

According to your training logs, recall is very high but precision is low. Is that normal?

wizyoung / YOLOv3_TensorFlow

Can you give an example on using the training conf. of the following training strategy of Applying the two-stage training strategy or the one-stage training strategy? #113