AssertionError: Model diverged with loss = NaN

nilboy / tensorflow-yolo

tensorflow implementation of 'YOLO : Real-Time Object Detection'(train and test)

774 stars 312 forks source link

AssertionError: Model diverged with loss = NaN #30

Open XuanheLiu opened 7 years ago

XuanheLiu commented 7 years ago

No loading yolo_tiny.ckpt. Direct training yolo_net, it will occur error ：AssertionError: Model diverged with loss = NaN.

wenbowen123 commented 6 years ago

@XuanheLiu Did you solve the problem? I met the same

XuanheLiu commented 6 years ago

@wenbowen123 The problem has been solved, the learning of the first small, and so on loss value down, and then increase, and then turn smaller.

wenbowen123 commented 6 years ago

@XuanheLiu Thank you! How does the trained result looks like? How is the accuracy?

XuanheLiu commented 6 years ago

@wenbowen123 I don't really know how to train. The weight of my training is not as good as that of the author. I remember the value of loss didn't drop to very low, and I don't remember how much it was.

ghost commented 6 years ago

@wenbowen123 so how you solve the error i run into the same problem, thanks a lot

Fju commented 6 years ago

The model diverges if the training process changes the weights too much and the loss becomes larger or rather extremely huge. Try changing the standard distribution of the weight variables when initialized and the constant value of the bias variables so that the initial biases and weights are relatively small. You can also consider changing the learning rate. Model divergence is as far as I know caused by those (hyper-)paramters. Cheers

guiyang882 commented 6 years ago

assert not np.isnan(loss_value), 'Model diverged with loss = NaN' AssertionError: Model diverged with loss = NaN

When I use Python3 to run this project, the trainer will be NaN, but when I use Python2 to run this project, the model was convergence. @XuanheLiu @Fju @nilboy

adr-arroyo commented 6 years ago

Does it make any sense that it works in Python 2 not in Python 3?????????

hwade commented 6 years ago

You need to compute and apply gradient seperately by the following process:

opt = tf.train.AdamOptimizer(0.1)
gvs = opt.compute_gradients(logits)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)

It can limit the range of computed gradient and prevent model from diverge.