microsoft / singleshotpose

This research project implements a real-time object detection and pose estimation method as described in the paper, Tekin et al. "Real-Time Seamless Single Shot 6D Object Pose Prediction", CVPR 2018. (https://arxiv.org/abs/1711.08848).
MIT License
720 stars 215 forks source link

Pretraining going to NaN loss after 600 iterations #100

Closed danieldimit closed 4 years ago

danieldimit commented 5 years ago

Hi, I am trying to train the network to predict the position of a single object. I am doing the pretraining procedure you've mentioned using the yolo-pose-pre.cfg file. I've changed the bottom part of the yolo-pose-pre.cfg file so that it would infere only 1 class:

[convolutional]
size=1
stride=1
pad=1
# filters=125
filters=20
activation=linear

[region]
# anchors =  1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
anchors = 0.1067, 0.9223
bias_match=1
classes=1
coords=18
num=1
softmax=1
jitter=.3
rescore=1

object_scale=0
noobject_scale=0
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1

But every time I try to pretrian (even on the APE object from the LINEMOD datase) at around ~600 images iterated the loss becomes really big really fast until it becomes NaN:

632: nGT 8, recall 0, proposals 1352, loss: x 5764.108398, y 5633.271484, conf 0.000000, total 11397.379883
640: nGT 8, recall 0, proposals 1352, loss: x 13393.198242, y 6664.856934, conf 0.000000, total 20058.054688
648: nGT 8, recall 0, proposals 1352, loss: x 462920.906250, y 333597.125000, conf 0.000000, total 796518.000000
656: nGT 8, recall 0, proposals 1132, loss: x 49253800.000000, y 15868911.000000, conf 0.000000, total 65122712.000000
664: nGT 8, recall 0, proposals 566, loss: x 54150683426816.000000, y 17054768824320.000000, conf 0.000000, total 71205449105408.000000
672: nGT 8, recall 0, proposals 349, loss: x 151635130340967876116892761456640.000000, y 47761167755206951643584462323712.000000, conf 0.000000, total 199396298096174827760477223780352.000000
680: nGT 8, recall 0, proposals 0, loss: x nan, y nan, conf 0.000000, total nan
688: nGT 8, recall 0, proposals 0, loss: x nan, y nan, conf nan, total nan

Is this normal and why would it happen? Here is my whole yolo-pose-pre.cfg file that I use.

danieldimit commented 5 years ago

After some experiments, I found out that lowering the learning rate in yolo-pose-pre.cfg prevents this error from happening. But the question still stands: Has anyone done pretraining successfully? I tried it but it never actually made a model better than the first one it created (which had acc 0), so it was useless.

btekin commented 4 years ago

We use the pretraining to be able to have reasonable confidence ground-truth when we start the actual training. The following part in the paper also explains this:

As the pose estimates in the early stages of training are inaccurate, the confidence values computed using Eq. 1 are initially unreliable. To remedy this, we pretrain our network parameters by setting the regularization parameter for confidence to 0. Subsequently, we train our network ...

We found it effective to pretrain the model without confidence estimation first and fine-tune the network later on with confidence estimation as well. You can also still train the network from a more crude initialization (with weights trained on ImageNet). However, this usually results in a slower convergence (sometimes even worse convergence, i.e. lower accuracy).