Training ctdet models without using coco-pretrained model results in 0 AP50 for 50+ epoch

sisrfeng commented 4 years ago

Hi, I train dla_34 and resdcn_18 on UA-DETRAC dataset using ImageNet pretrained backbone but not loading coco-pretrained model. The AP50 is always 0 for 50+ epoch. If I load coco-pretrained model, AP50 is about 76, a little worse than before finetuing. Do you know how to solve it? Thx!

shuangpengzheng commented 4 years ago

While you train network using ImageNet pretrained backbone，the loss of the network declines?

sisrfeng commented 4 years ago

Thx! The loss keeps around 10. I tried lr as the same as the author's, or 10x, 100x, but get the same story.

sisrfeng commented 4 years ago

dla_34 , resdcn_18 and hourglass are all the same story

cao-nv commented 4 years ago

It seems that you need a better initialization. Have you tried to train from the scratch with different learning rates?

sisrfeng commented 4 years ago

It seems that you need a better initialization. Have you tried to train from the scratch with different learning rates?

There are RELUs in the model, and I set the lr as 100x of the author's, making some neuron dead. 从头训练centernet，学习率是作者的100x，训练集的loss一直是5.5左右。因为很多神经元死了。后面再怎么调学习率也无力回天。

large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. That is, the ReLU units can irreversibly die during training since they can get knocked off the data manifold. For example, you may find that as much as 40% of your network can be "dead" (i.e. neurons that never activate across the entire training dataset) if the learning rate is set too high.

xingyizhou / CenterNet

Training ctdet models without using coco-pretrained model results in 0 AP50 for 50+ epoch #827