检测器学习率策略问题

leon-liangwu / MaskYolo_Caffe

YOLO V2 & V3 , YOLO Combined with RCNN and MaskRCNN

115 stars 50 forks source link

检测器学习率策略问题 #51

Closed guods closed 4 years ago

guods commented 4 years ago

net: "./mb_v2_t4_train.prototxt" test_initialization: false display: 100 average_loss: 100 lr_policy: "multifixed"

stagelr: 0.00001 stagelr: 0.0005 stagelr: 0.001 stagelr: 0.0001 stagelr: 0.00001

stageiter: 100 stageiter: 1000 stageiter: 150000 stageiter: 200000 max_iter: 240000 momentum: 0.9 weight_decay: 0.0005 snapshot: 5000 snapshot_prefix: "/data/Machine_Learning/models/mb_v2_t4_cls5" solver_mode: GPU 这个学习率策略设计的原来是什么？我换自己的数据训练检测器，训练几千代之后loss突然变得很大，然后就为nan?

leon-liangwu commented 4 years ago

这个策略也是同yolo原来的实现一致的，先小的lr学习一段时间，然后0.001，再慢慢减小。这个策略是自己实现的multifixed，就是指定iter和lr。你可以试试调小点lr或者让小的lr多训一些iter。loss突然变大，也有可能是你数据的gt造成，这个学习策略是没有问题的，我也训过自己的数据。

guods commented 4 years ago

当lr跑到0.001时再多跑几百代，loss就会为nan；将最大学习率降低后0.0001，训练了20多万代loss不收敛。

guods commented 4 years ago

检测器训练时，类别数只有person, num_class和分割一样，类别数大于1，num_class等于类别数，类别数等于1时，num_class=0吗

leon-liangwu commented 4 years ago

是的。我也不清楚你现在的问题在哪里，你或许可以检查一下你的label是不是有负数，什么的。

guods commented 4 years ago

@leon-liangwu 检测器的类别标签和YOLO检测器是一致的，除了背景，只检测person, 类别标签为1。和分割的类别设计不同。