Open YouSmart2016 opened 3 years ago
when i used icar2015 dataset to train, loss was nan,flollowing:
Epoch 1/10 200/200 [==============================] - ETA: 0s - loss: 10.4658 E:\githubcode\DifferentiableBinarization\generator_tf.py:98: RuntimeWarning: invalid value encountered in true_divide cosin = (square_distance - square_distance_1 - square_distance_2) /(2 np.sqrt(square_distance_1 square_distance_2)) 200/200 [==============================] - 6289s 32s/step - loss: 10.4462 - val_loss: 5.5865
Epoch 00001: saving model to checkpoints/2021-05-11\db_01_6.5202_5.5865.h5 Epoch 2/10 200/200 [==============================] - 2867s 14s/step - loss: nan - val_loss: nan
Epoch 00002: saving model to checkpoints/2021-05-11\db_02_nan_nan.h5 Epoch 3/10 200/200 [==============================] - 2693s 13s/step - loss: nan - val_loss: nan
Epoch 00003: saving model to checkpoints/2021-05-11\db_03_nan_nan.h5 Epoch 4/10 200/200 [==============================] - 2595s 13s/step - loss: nan - val_loss: nan
you could reduce lr when loss was nan. I guess.
I solve it by replace loss functions. In losses.py,
# comment out following two lines
#weights = (weights - tf.reduce_min(weights)) / (tf.reduce_max(weights) - tf.reduce_min(weights)) + 1.
#mask = mask * weights
....
# replace L1 loss with huber loss
#loss = K.switch(mask_sum > 0, tf.reduce_sum(tf.abs(pred - gt) * mask) / mask_sum, tf.constant(0.))
mask = tf.not_equal(mask, tf.constant(0.))
loss = K.switch(mask_sum > 0, tf.losses.huber_loss(gt[mask], pred[..., 0][mask]), tf.constant(0.))
def dice_loss(args):
pred, gt, mask, weights = args
pred = pred[..., 0]
# weights = (weights - tf.reduce_min(weights)) / (tf.reduce_max(weights) - tf.reduce_min(weights)) + 1.
# mask = mask * weights
intersection = tf.reduce_sum(pred * gt * mask)
union = tf.reduce_sum(pred * mask) + tf.reduce_sum(gt * mask) + 1e-6
loss = 1 - 2.0 * intersection / union
return loss
def l1_loss(args, scale=10.):
pred, gt, mask = args
pred = pred[..., 0]
mask_sum = tf.reduce_sum(mask)
# loss = K.switch(mask_sum > 0, tf.reduce_sum(tf.abs(pred - gt) * mask) / mask_sum, tf.constant(0.))
mask = tf.not_equal(mask, tf.constant(0.))
loss = K.switch(mask_sum > 0, tf.compat.v1.losses.huber_loss(gt[mask], pred[..., 0][mask]), tf.constant(0.))
loss = loss * scale
return loss
My losses code looks like this now. But I am getting this error. Can you help?
File "F:\DifferentiableBinarization\train.py", line 38, in <module>
model, prediction_model = dbnet(input_size=input_image_size)
File "F:\DifferentiableBinarization\model.py", line 60, in dbnet
loss = layers.Lambda(db_loss, name='db_loss')([p, b_hat, gt_input, mask_input, t, thresh_input, thresh_mask_input])
File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\keras\backend\tensorflow_backend.py", line 75, in symbolic_fn_wrapper
return func(*args, **kwargs)
File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\keras\engine\base_layer.py", line 489, in __call__
output = self.call(inputs, **kwargs)
File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\keras\layers\core.py", line 716, in call
return self.function(inputs, **arguments)
File "F:\DifferentiableBinarization\losses.py", line 57, in db_loss
l1_loss_ = l1_loss([thresh, thresh_map, thresh_mask])
File "F:\DifferentiableBinarization\losses.py", line 50, in l1_loss
loss = K.switch(mask_sum > 0, tf.compat.v1.losses.huber_loss(gt[mask], pred[..., 0][mask]), tf.constant(0.))
File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 905, in _slice_helper
return boolean_mask(tensor=tensor, mask=slice_spec)
File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1678, in boolean_mask
shape_tensor[axis:axis + ndims_mask].assert_is_compatible_with(shape_mask)
File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 1117, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (None, None) and (None, 1600, 1600) are incompatible
when i used icar2015 dataset to train, loss was nan,flollowing:
Epoch 1/10 200/200 [==============================] - ETA: 0s - loss: 10.4658 E:\githubcode\DifferentiableBinarization\generator_tf.py:98: RuntimeWarning: invalid value encountered in true_divide cosin = (square_distance - square_distance_1 - square_distance_2) /(2 np.sqrt(square_distance_1 square_distance_2)) 200/200 [==============================] - 6289s 32s/step - loss: 10.4462 - val_loss: 5.5865
Epoch 00001: saving model to checkpoints/2021-05-11\db_01_6.5202_5.5865.h5 Epoch 2/10 200/200 [==============================] - 2867s 14s/step - loss: nan - val_loss: nan
Epoch 00002: saving model to checkpoints/2021-05-11\db_02_nan_nan.h5 Epoch 3/10 200/200 [==============================] - 2693s 13s/step - loss: nan - val_loss: nan
Epoch 00003: saving model to checkpoints/2021-05-11\db_03_nan_nan.h5 Epoch 4/10 200/200 [==============================] - 2595s 13s/step - loss: nan - val_loss: nan